The whole memory on one of our Oracle Linux VMs was exhausted. The out of memory killer started killing the processes:
Jun 29 10:32:48 host kernel: OSWatcher.sh invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0
The OS watcher was collecting the information in /proc/meminfo, which proved invaluable for the analysis. Almost the whole memory was consumed by slabs. The slab grew from 2928720 kB to 77052224 kB within a couple of minutes:
grid@host:/u00/oracle/GI/gridbase/oracle.ahf/data/repository/suptools/host/oswbb/grid/archive/oswmeminfo> egrep "zzz|Slab|MemAvailable" host_meminfo_22.06.29.1000.dat
zzz ***Wed Jun 29 10:24:15 CEST 2022
MemAvailable: 50734336 kB
Slab: 2928720 kB
zzz ***Wed Jun 29 10:24:45 CEST 2022
MemAvailable: 50754784 kB
Slab: 2928428 kB
zzz ***Wed Jun 29 10:25:15 CEST 2022
MemAvailable: 50748708 kB
Slab: 2928400 kB
zzz ***Wed Jun 29 10:25:45 CEST 2022
MemAvailable: 4890828 kB
Slab: 47753392 kB
zzz ***Wed Jun 29 10:26:15 CEST 2022
MemAvailable: 853208 kB
Slab: 52054036 kB
zzz ***Wed Jun 29 10:26:45 CEST 2022
MemAvailable: 767924 kB
Slab: 52249192 kB
zzz ***Wed Jun 29 10:27:15 CEST 2022
MemAvailable: 750048 kB
Slab: 52442408 kB
zzz ***Wed Jun 29 10:27:45 CEST 2022
MemAvailable: 673256 kB
Slab: 52612572 kB
zzz ***Wed Jun 29 10:28:15 CEST 2022
MemAvailable: 692748 kB
Slab: 53122860 kB
zzz ***Wed Jun 29 10:28:45 CEST 2022
MemAvailable: 595100 kB
Slab: 53553064 kB
zzz ***Wed Jun 29 10:29:16 CEST 2022
MemAvailable: 568432 kB
Slab: 53969576 kB
zzz ***Wed Jun 29 10:29:46 CEST 2022
MemAvailable: 572700 kB
Slab: 54801556 kB
zzz ***Wed Jun 29 10:30:16 CEST 2022
MemAvailable: 615584 kB
Slab: 55338836 kB
zzz ***Wed Jun 29 10:30:46 CEST 2022
MemAvailable: 599560 kB
Slab: 55715568 kB
zzz ***Wed Jun 29 10:31:16 CEST 2022
MemAvailable: 555600 kB
Slab: 55861032 kB
zzz ***Wed Jun 29 10:31:47 CEST 2022
MemAvailable: 556200 kB
Slab: 56051892 kB
zzz ***Wed Jun 29 10:32:17 CEST 2022
MemAvailable: 491812 kB
Slab: 56165600 kB
zzz ***Wed Jun 29 10:32:48 CEST 2022
MemAvailable: 468500 kB
Slab: 56284980 kB
zzz ***Wed Jun 29 10:33:18 CEST 2022
MemAvailable: 2326488 kB
Slab: 74419980 kB
zzz ***Wed Jun 29 10:35:40 CEST 2022
MemAvailable: 227620 kB
Slab: 77052224 kB
zzz ***Wed Jun 29 10:38:11 CEST 2022
MemAvailable: 229028 kB
Slab: 77059908 kB
zzz ***Wed Jun 29 10:42:13 CEST 2022
MemAvailable: 227812 kB
Slab: 77067504 kB
zzz ***Wed Jun 29 10:48:35 CEST 2022
MemAvailable: 236408 kB
Slab: 77081200 kB
zzz ***Wed Jun 29 10:49:42 CEST 2022
MemAvailable: 224536 kB
Slab: 77095736 kB
zzz ***Wed Jun 29 10:54:03 CEST 2022
MemAvailable: 232752 kB
Slab: 77107492 kB
zzz ***Wed Jun 29 10:55:58 CEST 2022
MemAvailable: 222448 kB
Slab: 77118004 kB
zzz ***Wed Jun 29 10:57:48 CEST 2022
MemAvailable: 227048 kB
Slab: 77127064 kB
zzz ***Wed Jun 29 11:00:05 CEST 2022
MemAvailable: 215600 kB
Slab: 77137560 kB
The nextstep would have been to show the breakdown of slab allocations to see which slab was causing the problem.
This can be done, for example, with slabtop (the outputs below were taken after the problem disappeared):
slabtop
Active / Total Objects (% used) : 4618223 / 4645386 (99.4%)
Active / Total Slabs (% used) : 78179 / 78179 (100.0%)
Active / Total Caches (% used) : 101 / 141 (71.6%)
Active / Total Size (% used) : 826284.55K / 837056.44K (98.7%)
Minimum / Average / Maximum Object : 0.01K / 0.18K / 8.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
1029405 1029181 99% 0.20K 26395 39 211160K dentry
566076 566076 100% 0.09K 13478 42 53912K kmalloc-rcl-96
417280 417280 100% 0.01K 815 512 3260K kmalloc-8
378250 378250 100% 0.02K 2225 170 8900K avtab_node
334976 332731 99% 0.03K 2617 128 10468K kmalloc-32
303461 302987 99% 0.05K 4157 73 16628K Acpi-Parse
231040 231040 100% 0.06K 3610 64 14440K kmalloc-64
220672 220672 100% 0.02K 862 256 3448K kmalloc-16
200736 200489 99% 0.04K 1968 102 7872K avtab_extended_perms
195120 193680 99% 1.06K 6504 30 208128K xfs_inode
128632 128632 100% 0.57K 2297 56 73504K radix_tree_node
94878 94350 99% 0.19K 2259 42 18072K dmaengine-unmap-16
67524 66956 99% 0.62K 1324 51 42368K inode_cache
Another way is vmstat -m:
vmstat -m | sort -k 3 -n -r
Cache Num Total Size Pages
dentry 1057538 1058109 208 39
kmalloc-rcl-96 566076 566076 96 42
kmalloc-8 417280 417280 8 512
avtab_node 378250 378250 24 170
kmalloc-32 335889 337024 32 128
Acpi-Parse 329632 330106 56 73
kmalloc-64 233600 233600 64 64
kmalloc-16 220928 220928 16 256
avtab_extended_perms 201042 201042 40 102
xfs_inode 195202 195240 1088 30
radix_tree_node 136808 136808 584 56
dmaengine-unmap-16 94920 94920 192 42
inode_cache 67493 67626 640 51
You can also read /proc/slabinfo directly:
cat /proc/slabinfo | sort -k 2 -n -r
slabinfo - version: 2.1
# name : tunables : slabdata
dentry 1075951 1079676 208 39 2 : tunables 0 0 0 : slabdata 27684 27684 0
kmalloc-rcl-96 566160 566160 96 42 1 : tunables 0 0 0 : slabdata 13480 13480 0
kmalloc-8 417280 417280 8 512 1 : tunables 0 0 0 : slabdata 815 815 0
avtab_node 378250 378250 24 170 1 : tunables 0 0 0 : slabdata 2225 2225 0
Acpi-Parse 344990 349086 56 73 1 : tunables 0 0 0 : slabdata 4782 4782 0
kmalloc-32 339085 340352 32 128 1 : tunables 0 0 0 : slabdata 2659 2659 0
kmalloc-64 245855 245952 64 64 1 : tunables 0 0 0 : slabdata 3843 3843 0
kmalloc-16 230400 230400 16 256 1 : tunables 0 0 0 : slabdata 900 900 0
avtab_extended_perms 211344 211344 40 102 1 : tunables 0 0 0 : slabdata 2072 2072 0
xfs_inode 207539 207540 1088 30 8 : tunables 0 0 0 : slabdata 6918 6918 0
radix_tree_node 155786 156016 584 56 8 : tunables 0 0 0 : slabdata 2786 2786 0
dmaengine-unmap-16 100842 100842 192 42 2 : tunables 0 0 0 : slabdata 2401 2401 0
inode_cache 68388 68850 640 51 8 : tunables 0 0 0 : slabdata 1350 1350 0
The OSWatcher chooses the last approach for getting the slabs information – it reads the file /proc/slabinfo.
Unfortunately, this file is readable only by root, so the OSWatcher, which is running under grid, isn’t collecting the information:
ls -l /proc/slabinfo
-r--------. 1 root root 0 Jun 29 15:04 /proc/slabinfo
The purpose of this blog post is to show how to display the slabs allocations and warn that you need to change the permissions on /proc/slabinfo to make sure that OSWatcher can collect this information.