Kernel Memory Leak due to ACFS Snapshots

slab

A VM hosting Oracle Clusterware 21.3 and databases on ACFS hung. It wasn’t just a one-off issue – the VM hung a couple of minutes after each Clusterware restart. Simply put, the VM became unusable.

Oracle OSWatcher provided the first clue – in only a couple of minutes, slab rocketed 2928720 kB to 74419980 kB leaving almost no free memory:

egrep "zzz|Slab|MemAvailable" /u00/oracle/GI/gridbase/oracle.ahf/data/repository/suptools/host/oswbb/grid/archive/oswmeminfo/host.vt.ch_meminfo_22.06.29.1000.dat
zzz ***Wed Jun 29 10:24:15 CEST 2022
MemAvailable:   50734336 kB
Slab:            2928720 kB
zzz ***Wed Jun 29 10:24:45 CEST 2022
MemAvailable:   50754784 kB
Slab:            2928428 kB
zzz ***Wed Jun 29 10:25:15 CEST 2022
MemAvailable:   50748708 kB
Slab:            2928400 kB
zzz ***Wed Jun 29 10:25:45 CEST 2022
MemAvailable:    4890828 kB
Slab:           47753392 kB
zzz ***Wed Jun 29 10:26:15 CEST 2022
MemAvailable:     853208 kB
Slab:           52054036 kB
zzz ***Wed Jun 29 10:26:45 CEST 2022
MemAvailable:     767924 kB
Slab:           52249192 kB
zzz ***Wed Jun 29 10:27:15 CEST 2022
MemAvailable:     750048 kB
Slab:           52442408 kB
zzz ***Wed Jun 29 10:27:45 CEST 2022
MemAvailable:     673256 kB
Slab:           52612572 kB
zzz ***Wed Jun 29 10:28:15 CEST 2022
MemAvailable:     692748 kB
Slab:           53122860 kB
zzz ***Wed Jun 29 10:28:45 CEST 2022
MemAvailable:     595100 kB
Slab:           53553064 kB
zzz ***Wed Jun 29 10:29:16 CEST 2022
MemAvailable:     568432 kB
Slab:           53969576 kB
zzz ***Wed Jun 29 10:29:46 CEST 2022
MemAvailable:     572700 kB
Slab:           54801556 kB
zzz ***Wed Jun 29 10:30:16 CEST 2022
MemAvailable:     615584 kB
Slab:           55338836 kB
zzz ***Wed Jun 29 10:30:46 CEST 2022
MemAvailable:     599560 kB
Slab:           55715568 kB
zzz ***Wed Jun 29 10:31:16 CEST 2022
MemAvailable:     555600 kB
Slab:           55861032 kB
zzz ***Wed Jun 29 10:31:47 CEST 2022
MemAvailable:     556200 kB
Slab:           56051892 kB
zzz ***Wed Jun 29 10:32:17 CEST 2022
MemAvailable:     491812 kB
Slab:           56165600 kB
zzz ***Wed Jun 29 10:32:48 CEST 2022
MemAvailable:     468500 kB
Slab:           56284980 kB
zzz ***Wed Jun 29 10:33:18 CEST 2022
MemAvailable:    2326488 kB
Slab:           74419980 kB

ACFS buffer cache

slabtop provides the breakdown of the slab allocations:

slabtop
Active / Total Objects (% used)    : 20455612 / 20599104 (99.3%)
Active / Total Slabs (% used)      : 2798200 / 2798200 (100.0%)
Active / Total Caches (% used)     : 127 / 184 (69.0%)
Active / Total Size (% used)       : 76278221.22K / 76313158.69K (100.0%)
Minimum / Average / Maximum Object : 0.01K / 3.70K / 64.01K

OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
9176426 9176426 100%    0.28K 164002       56   2624032K acfs_ofsbuf_cache
9176260 9176260 100%    8.00K 2608768        4  83480576K acfs_buffer_cache_4k
420352 420352 100%    0.01K    821      512      3284K kmalloc-8
378250 378250 100%    0.02K   2225      170      8900K avtab_node
298880 294703  98%    0.03K   2335      128      9340K kmalloc-32

The whole memory was consumed by ACFS buffer cache. At this point, it wasn’t clear what caused these extreme memory allocations. So, I started observing the slab growth in real time after restarting the VM:

while :
do
date
grep Slab /proc/meminfo
echo 
sleep 1
done

Mon Jul 11 14:36:08 CEST 2022
Slab:             421856 kB

Mon Jul 11 14:36:10 CEST 2022
Slab:             421888 kB

Mon Jul 11 14:36:11 CEST 2022
Slab:            2733208 kB

Mon Jul 11 14:36:12 CEST 2022
Slab:            7733784 kB

Mon Jul 11 14:36:13 CEST 2022
Slab:           13214688 kB

Mon Jul 11 14:36:14 CEST 2022
Slab:           18859680 kB

Mon Jul 11 14:36:15 CEST 2022
Slab:           24406444 kB

Mon Jul 11 14:36:16 CEST 2022
Slab:           29065932 kB

Mon Jul 11 14:36:17 CEST 2022
Slab:           33006508 kB

Mon Jul 11 14:36:18 CEST 2022
Slab:           37035908 kB

Mon Jul 11 14:36:19 CEST 2022
Slab:           40773812 kB

Mon Jul 11 14:36:21 CEST 2022
Slab:           43771604 kB

Mon Jul 11 14:36:22 CEST 2022
Slab:           46700020 kB

While slab was increasing, I traced kernel memory allocations for a very short time:

/usr/share/bcc/tools/trace -T 't:kmem:kmem_cache_alloc' > trace 2> /dev/null

A simple trace postprocessing revealed that the most allocations were done by the kernel thread acfsnimap:

grep 14:36:16 trace | awk '{print $4}' | sort | uniq -c | sort -k 1 -n -r

  50962 acfsnimap
   3872 acfssnap_tmp
   1577 UsmRdRb-02
   1288 kworker/4:1H
    988 UsmRdRb-00
    503 UsmRdRb-05
    450 UsmRdRb-04
    229 UsmRdRb-01
    207 UsmRdRb-03
    202 UsmRdRb-06
    166 grep
    128 bash
     40 crscdpd.bin
     32 asm_imr0_+asm1
     26 kauditd
     18 orarootagent.bi
     17 auditd
     12 osysmond.bin

Because of the substring “sn” in the name, I made the assumption that the kernel thread acfsnimap is involved in ACFS snapshot handling. Consequently, I inspected the acfs log files.

Corrupt snapshot

The last entries in the /u00/oracle/GI/gridbase/crsdata/host/acfs/event.log.0 when the issue appeared for the first time were indicated that some snapshots were deleted:


2022-06-29 10:25:32.648 (5064080.797) : acfsutil.bin[1958274] Start deleting snapshot from ACFS file system F8: snapshot name:CDB_2022-06-16_15_39_37_PDB dev_path:/dev/asm/pdb_1-417 mountPoint:/u00/oracle/oradata01/CDB/PDB
2022-06-29 10:25:32.709 (5064080.858) : acfsutil.bin[1958274] Successfully deleted snapshot from ACFS file system F8: snapshot name:CDB_2022-06-16_15_39_37_PDB dev_path:/dev/asm/pdb_1-417 mountPoint:/u00/oracle/oradata01/CDB/PDB
2022-06-29 10:26:03.097 (5064111.246) : acfsutil.bin[1958752] Start deleting snapshot from ACFS file system F1: snapshot name:CDB_2022-06-16_15_39_37 dev_path:/dev/asm/cdb_1-417 mountPoint:/u00/oracle/oradata01/CDB
2022-06-29 10:26:05.264 (5064113.413) : acfsutil.bin[1958752] Successfully deleted snapshot from ACFS file system F1: snapshot name:CDB_2022-06-16_15_39_37 dev_path:/dev/asm/cdb_1-417 mountPoint:/u00/oracle/oradata01/CDB

According to the event.log, the snapshot were successfully deleted a long time ago. But the “acfsutil snap info” command was still showing “delete in progress” for one of the deleted snapshots:

acfsutil snap info /u00/oracle/oradata01/CDB/PDB
snapshot name:               CDB_2022-06-16_15_39_37_PDB (delete in progress)
snapshot creation time:      Thu Jun 16 15:40:01 2022

    number of snapshots:  0 (active)
                          1 (delete in progress)
    kilosnap state:       Enabled
    storage interest tracking: Enabled
    snapshot space usage: 130131570688  ( 121.19 GB )

In other words, even though the “afsutil snap delete” completed without errors, the snapshot hasn’t been deleted yet. Whenever the file system had been mounted, the pending delete bloated the slab until it consumed the whole memory on the server.

Another error message in acfs.log also indicated that something’s wrong with the snapshot:

F2 4303519.747 2022-07-11 14:36:10.644 acfssnap_tmp[80720] OfsSnapStorageThread: FAILED CREATE loaded sfeta for snapId 0x1

This problem isn’t related to the database load – the database wasn’t even running. It’s sufficient to mount the file system to reproduce the issue:


sudo /bin/mount -t acfs /dev/asm/cdb_1-417 /u00/oracle/oradata01/CDB
sudo /bin/mount -t acfs /dev/asm/pdb_1-417 /u00/oracle/oradata01/CDB/PDB

The allocated memory isn’t returned to the OS until a VM restart – it remains allocated even after unmounting the file system with the corrupt snapshot.

Recovery

I tried to limit the memory for the ACFS resource, but that hasn’t worked:

/u00/oracle/GI/product/21.3.0/bin/crsctl modify resource ora.data.pdb_1.acfs -attr WORKLOAD_MEMORY_MAX=1024 -unsupported

The first thing to do after hitting this problem is to identify the culprit. The relevant information is in the acfs and event log files in the /u00/oracle/GI/gridbase/crsdata/host/acfs directory. Then, you have to prevent the memory leak by disabling the file system and the database with the corrupt snapshot, for example:

/u00/oracle/GI/product/21.3.0/bin/crsctl modify resource ora.data.pdb_1.acfs -attr AUTO_START=never -unsupported

/u00/oracle/GI/product/21.3.0/bin/crsctl modify resource ora.pdb.db -attr AUTO_START=never -unsupported

Finally, you have to delete the file system, create a new one and restore the database.

Unfortunately, I’m still unaware of a way to get rid of a snapshot with a pending delete without dropping the whole file system. The snapshot directory /u00/oracle/oradata01/CDB/PDB/.ACFS/snaps was already empty.

Thanks for sharing

Nenad Noveljic

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.