slab
A VM hosting Oracle Clusterware 21.3 and databases on ACFS hung. It wasn’t just a one-off issue – the VM hung a couple of minutes after each Clusterware restart. Simply put, the VM became unusable.
Oracle OSWatcher provided the first clue – in only a couple of minutes, slab rocketed 2928720 kB to 74419980 kB leaving almost no free memory:
egrep "zzz|Slab|MemAvailable" /u00/oracle/GI/gridbase/oracle.ahf/data/repository/suptools/host/oswbb/grid/archive/oswmeminfo/host.vt.ch_meminfo_22.06.29.1000.dat
zzz ***Wed Jun 29 10:24:15 CEST 2022
MemAvailable: 50734336 kB
Slab: 2928720 kB
zzz ***Wed Jun 29 10:24:45 CEST 2022
MemAvailable: 50754784 kB
Slab: 2928428 kB
zzz ***Wed Jun 29 10:25:15 CEST 2022
MemAvailable: 50748708 kB
Slab: 2928400 kB
zzz ***Wed Jun 29 10:25:45 CEST 2022
MemAvailable: 4890828 kB
Slab: 47753392 kB
zzz ***Wed Jun 29 10:26:15 CEST 2022
MemAvailable: 853208 kB
Slab: 52054036 kB
zzz ***Wed Jun 29 10:26:45 CEST 2022
MemAvailable: 767924 kB
Slab: 52249192 kB
zzz ***Wed Jun 29 10:27:15 CEST 2022
MemAvailable: 750048 kB
Slab: 52442408 kB
zzz ***Wed Jun 29 10:27:45 CEST 2022
MemAvailable: 673256 kB
Slab: 52612572 kB
zzz ***Wed Jun 29 10:28:15 CEST 2022
MemAvailable: 692748 kB
Slab: 53122860 kB
zzz ***Wed Jun 29 10:28:45 CEST 2022
MemAvailable: 595100 kB
Slab: 53553064 kB
zzz ***Wed Jun 29 10:29:16 CEST 2022
MemAvailable: 568432 kB
Slab: 53969576 kB
zzz ***Wed Jun 29 10:29:46 CEST 2022
MemAvailable: 572700 kB
Slab: 54801556 kB
zzz ***Wed Jun 29 10:30:16 CEST 2022
MemAvailable: 615584 kB
Slab: 55338836 kB
zzz ***Wed Jun 29 10:30:46 CEST 2022
MemAvailable: 599560 kB
Slab: 55715568 kB
zzz ***Wed Jun 29 10:31:16 CEST 2022
MemAvailable: 555600 kB
Slab: 55861032 kB
zzz ***Wed Jun 29 10:31:47 CEST 2022
MemAvailable: 556200 kB
Slab: 56051892 kB
zzz ***Wed Jun 29 10:32:17 CEST 2022
MemAvailable: 491812 kB
Slab: 56165600 kB
zzz ***Wed Jun 29 10:32:48 CEST 2022
MemAvailable: 468500 kB
Slab: 56284980 kB
zzz ***Wed Jun 29 10:33:18 CEST 2022
MemAvailable: 2326488 kB
Slab: 74419980 kB
ACFS buffer cache
slabtop provides the breakdown of the slab allocations:
slabtop
Active / Total Objects (% used) : 20455612 / 20599104 (99.3%)
Active / Total Slabs (% used) : 2798200 / 2798200 (100.0%)
Active / Total Caches (% used) : 127 / 184 (69.0%)
Active / Total Size (% used) : 76278221.22K / 76313158.69K (100.0%)
Minimum / Average / Maximum Object : 0.01K / 3.70K / 64.01K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
9176426 9176426 100% 0.28K 164002 56 2624032K acfs_ofsbuf_cache
9176260 9176260 100% 8.00K 2608768 4 83480576K acfs_buffer_cache_4k
420352 420352 100% 0.01K 821 512 3284K kmalloc-8
378250 378250 100% 0.02K 2225 170 8900K avtab_node
298880 294703 98% 0.03K 2335 128 9340K kmalloc-32
The whole memory was consumed by ACFS buffer cache. At this point, it wasn’t clear what caused these extreme memory allocations. So, I started observing the slab growth in real time after restarting the VM:
while :
do
date
grep Slab /proc/meminfo
echo
sleep 1
done
Mon Jul 11 14:36:08 CEST 2022
Slab: 421856 kB
Mon Jul 11 14:36:10 CEST 2022
Slab: 421888 kB
Mon Jul 11 14:36:11 CEST 2022
Slab: 2733208 kB
Mon Jul 11 14:36:12 CEST 2022
Slab: 7733784 kB
Mon Jul 11 14:36:13 CEST 2022
Slab: 13214688 kB
Mon Jul 11 14:36:14 CEST 2022
Slab: 18859680 kB
Mon Jul 11 14:36:15 CEST 2022
Slab: 24406444 kB
Mon Jul 11 14:36:16 CEST 2022
Slab: 29065932 kB
Mon Jul 11 14:36:17 CEST 2022
Slab: 33006508 kB
Mon Jul 11 14:36:18 CEST 2022
Slab: 37035908 kB
Mon Jul 11 14:36:19 CEST 2022
Slab: 40773812 kB
Mon Jul 11 14:36:21 CEST 2022
Slab: 43771604 kB
Mon Jul 11 14:36:22 CEST 2022
Slab: 46700020 kB
While slab was increasing, I traced kernel memory allocations for a very short time:
/usr/share/bcc/tools/trace -T 't:kmem:kmem_cache_alloc' > trace 2> /dev/null
A simple trace postprocessing revealed that the most allocations were done by the kernel thread acfsnimap:
grep 14:36:16 trace | awk '{print $4}' | sort | uniq -c | sort -k 1 -n -r
50962 acfsnimap
3872 acfssnap_tmp
1577 UsmRdRb-02
1288 kworker/4:1H
988 UsmRdRb-00
503 UsmRdRb-05
450 UsmRdRb-04
229 UsmRdRb-01
207 UsmRdRb-03
202 UsmRdRb-06
166 grep
128 bash
40 crscdpd.bin
32 asm_imr0_+asm1
26 kauditd
18 orarootagent.bi
17 auditd
12 osysmond.bin
Because of the substring “sn” in the name, I made the assumption that the kernel thread acfsnimap is involved in ACFS snapshot handling. Consequently, I inspected the acfs log files.
Corrupt snapshot
The last entries in the /u00/oracle/GI/gridbase/crsdata/host/acfs/event.log.0 when the issue appeared for the first time were indicated that some snapshots were deleted:
2022-06-29 10:25:32.648 (5064080.797) : acfsutil.bin[1958274] Start deleting snapshot from ACFS file system F8: snapshot name:CDB_2022-06-16_15_39_37_PDB dev_path:/dev/asm/pdb_1-417 mountPoint:/u00/oracle/oradata01/CDB/PDB
2022-06-29 10:25:32.709 (5064080.858) : acfsutil.bin[1958274] Successfully deleted snapshot from ACFS file system F8: snapshot name:CDB_2022-06-16_15_39_37_PDB dev_path:/dev/asm/pdb_1-417 mountPoint:/u00/oracle/oradata01/CDB/PDB
2022-06-29 10:26:03.097 (5064111.246) : acfsutil.bin[1958752] Start deleting snapshot from ACFS file system F1: snapshot name:CDB_2022-06-16_15_39_37 dev_path:/dev/asm/cdb_1-417 mountPoint:/u00/oracle/oradata01/CDB
2022-06-29 10:26:05.264 (5064113.413) : acfsutil.bin[1958752] Successfully deleted snapshot from ACFS file system F1: snapshot name:CDB_2022-06-16_15_39_37 dev_path:/dev/asm/cdb_1-417 mountPoint:/u00/oracle/oradata01/CDB
According to the event.log, the snapshot were successfully deleted a long time ago. But the “acfsutil snap info” command was still showing “delete in progress” for one of the deleted snapshots:
acfsutil snap info /u00/oracle/oradata01/CDB/PDB
snapshot name: CDB_2022-06-16_15_39_37_PDB (delete in progress)
snapshot creation time: Thu Jun 16 15:40:01 2022
number of snapshots: 0 (active)
1 (delete in progress)
kilosnap state: Enabled
storage interest tracking: Enabled
snapshot space usage: 130131570688 ( 121.19 GB )
In other words, even though the “afsutil snap delete” completed without errors, the snapshot hasn’t been deleted yet. Whenever the file system had been mounted, the pending delete bloated the slab until it consumed the whole memory on the server.
Another error message in acfs.log also indicated that something’s wrong with the snapshot:
F2 4303519.747 2022-07-11 14:36:10.644 acfssnap_tmp[80720] OfsSnapStorageThread: FAILED CREATE loaded sfeta for snapId 0x1
This problem isn’t related to the database load – the database wasn’t even running. It’s sufficient to mount the file system to reproduce the issue:
sudo /bin/mount -t acfs /dev/asm/cdb_1-417 /u00/oracle/oradata01/CDB
sudo /bin/mount -t acfs /dev/asm/pdb_1-417 /u00/oracle/oradata01/CDB/PDB
The allocated memory isn’t returned to the OS until a VM restart – it remains allocated even after unmounting the file system with the corrupt snapshot.
Recovery
I tried to limit the memory for the ACFS resource, but that hasn’t worked:
/u00/oracle/GI/product/21.3.0/bin/crsctl modify resource ora.data.pdb_1.acfs -attr WORKLOAD_MEMORY_MAX=1024 -unsupported
The first thing to do after hitting this problem is to identify the culprit. The relevant information is in the acfs and event log files in the /u00/oracle/GI/gridbase/crsdata/host/acfs directory. Then, you have to prevent the memory leak by disabling the file system and the database with the corrupt snapshot, for example:
/u00/oracle/GI/product/21.3.0/bin/crsctl modify resource ora.data.pdb_1.acfs -attr AUTO_START=never -unsupported
/u00/oracle/GI/product/21.3.0/bin/crsctl modify resource ora.pdb.db -attr AUTO_START=never -unsupported
Finally, you have to delete the file system, create a new one and restore the database.
Unfortunately, I’m still unaware of a way to get rid of a snapshot with a pending delete without dropping the whole file system. The snapshot directory /u00/oracle/oradata01/CDB/PDB/.ACFS/snaps was already empty.