Redo copy latch
In the previous blog post I showed how a DML can hold a redo copy latch for a very long time if private strands aren’t used. Since recursive statements during logon bypass private strands, this can cause serious latch wait cascades. In particular, most of this time is spent in the function kcbklbc, which links a changed block to the checkpoint queue.
Unlike redo copy latch, checkpoint queue latch latch is being held very briefly. This indicates that the block is linked very fast, and that the extra time is spent elsewhere. That overhead is the main focus of this article.
Assembly Tracing
With DTrace, we can trace offsets [1]. An offset is the position of the instruction in bytes relative to the function start. As we shall see, we can get useful information by analyzing an offset trace even without disassembling the code. In addition to offsets, the following script traces latch allocations within the function.
#pragma D option dynvarsize=32m
#pragma D option quiet
pid$target:oracle:kcbklbc:0
{
self->started = timestamp ;
ustack();
self->kcbklbc = 1;
}
pid$target:oracle:kslgetl:entry,
pid$target:oracle:kslfre:entry
/ self->kcbklbc /
{
printf("%s %s %X\n", probefunc, probename, arg0);
}
pid$target:oracle:kcbklbc:
{
printf("%d offset: %s\n", ( timestamp - self->started ) / 1000, probename);
}
pid$target:oracle:kcbklbc:return
{
self->kcbklbc = 0 ;
}
Here are the most relevant parts of a huge trace:
..
1 offset: 0
3 offset: 2
6 offset: 3
...
453 offset: 882
kslgetl entry 9E551EB8 --> acquired checkpoint queue latch latch
462 offset: 887
...
1048 offset: 4ac
kslfre entry 9E551EB8 --> released checkpoint queue latch latch
1057 offset: 4b1
...
1403 offset: d1d
kslfre entry 9C438688 --> released object queue header operation latch
1424 offset: d22
...
1539 offset: f13
1543 offset: f17
1555 offset: f1b
1566 offset: f23
1571 offset: f41
1575 offset: f45
1579 offset: f48
1581 offset: f13
1583 offset: f17
1585 offset: f1b
1588 offset: f23
1590 offset: f41
1592 offset: f45
1594 offset: f48
1596 offset: f13
1598 offset: f17
1600 offset: f1b
1610 offset: f23
1613 offset: f41
1615 offset: f45
1618 offset: f48
...
10012 offset: f13
10015 offset: f17
10017 offset: f1b
10021 offset: f23
10023 offset: f25
10025 offset: f2c
10027 offset: f33
10041 offset: f37
10044 offset: f3a
10053 offset: f3e
10055 offset: f41
10058 offset: f45
10060 offset: f48
10063 offset: f13
10066 offset: f17
10068 offset: f1b
10077 offset: f23
10080 offset: f25
10083 offset: f2c
10086 offset: f33
10095 offset: f37
10098 offset: f3a
10107 offset: f3e
10111 offset: f41
10113 offset: f45
10116 offset: f48
10118 offset: f13
10120 offset: f17
10122 offset: f1b
10125 offset: f23
...
30261 offset: return
The first column is the timing in microseconds relative to the function entry. The cut out parts are marked with “…”.
The latch names can be obtained from the database:
select name,addr from v$latch_children where addr like '%9E551EB8' or addr like '%9C438688';
NAME ADDR
---------------------------------------------------------------- ----------------
checkpoint queue latch 000000009E551EB8
object queue header operation 000000009C438688
From the trace above, we can observe following:
- The whole traced execution lasted around 30ms.
- The most time is spent in the loop that was executed after the last latch was released. (Two iterations are marked with blue.)
- None of the latches were acquired in the loop.
Working data sets array
There were 1080 loop iterations:
grep f13 kcbklbc.log | wc -l
1080
The number of the iterations coincides with the undocumented parameter _db_block_lru_latches:
select a.ksppinm name,b.ksppstvl value
from sys.x$ksppi a, sys.x$ksppcv b
where a.indx = b.indx and b.ksppstvl = '1080'
;
NAME
--------------------------------------------------------------------------------
VALUE
--------------------------------------------------------------------------------
_db_block_lru_latches
1080
Each LRU latch protects a working data set [2]. There are indeed 1080 working data sets:
select count(*) from x$kcbwds;
COUNT(*)
----------
1080
The loop iterates over the working data sets. I doublecheked that by changing the parameter value and verifying that the number of iterations changed accordingly.
So, the more working data sets, the longer redo copy latch is being held. As the number of working data sets is proportional to the number of CPUs, the problem is worse on larger servers.
MTTR advisor
Further, there are two types of iterations: with 7 and 13 instructions. This variation reveals a conditional branching. Three instructions must be for the loop control and three for the branching. This leaves us with just a few instructions doing the actual work – they calculate the sum of values obtained by dereferencing several pointers in the SGA.
The reads in the loop aren’t protected by a latch. This means that consistency isn’t critical. Jonathan Lewis pointed out that the result might be consumed by an advisor. Below are all advisor parameters:
@pd2.sql %advi%
Show all parameters and session values from x$ksppi/x$ksppcv...
NUM N_HEX NAME VALUE DESCRIPTION
---------- ----- -------------------------------------------------------- ------------------------------ ---------------------------------------------------------------------------------------------------
1825 721 db_cache_advice OFF Buffer cache sizing advisory
1826 722 _db_cache_advice_sample_factor 4 cache advisory sampling factor
1827 723 _db_cache_advice_max_size_factor 2 cache advisory maximum multiple of current size to similate
1828 724 _db_cache_advice_sanity_check FALSE cache simulation sanity check
1829 725 _db_cache_advice_hash_latch_multiple 16 cache advisory hash latch multiple
1830 726 _db_mttr_advice ON MTTR advisory
1833 729 _db_mttr_partitions 0 number of partitions for MTTR advisory
2690 A82 _simulate_io_wait 0 Simulate I/O wait to test segment advisor
2851 B23 _compression_advisor 0 Compression advisor
3245 CAD _library_cache_advice TRUE whether KGL advice should be turned on
3246 CAE _kglsim_maxmem_percent 5 max percentage of shared pool size to be used for KGL advice
3888 F30 _smm_advice_log_size 0 overwrites default size of the PGA advice workarea history log
3889 F31 _smm_advice_enabled TRUE if TRUE, enable v$pga_advice
3972 F84 _sta_control 0 SQL Tuning Advisory control parameter
3973 F85 _enable_automatic_sqltune TRUE Automatic SQL Tuning Advisory enabled parameter
4252 109C _partition_advisor_srs_active true enables sampling based partitioning validation
_db_advise_mttr was the first choice, because I previously discovered that the loop isn’t being executed if fast_start_mttr_target = 0. The program indeed doesn’t iterate through the working data sets after disabling the MTTR advisor (_db_mttr_advice = OFF). Simply put, on a large server with many CPUs redo copy latch waits can be massively reduced by disabling the MTTR advisor.
What data exactly does the MTTR advisor code read?
Checkpont queues
Each working data set has two checkpoint queues. Even though the pointers to them are contained in the working data set structure, they are, sadly, not externalized in x$kcbwds. But we can get them with a debugger by peeking into the following addresses (at least on 19.7): x$kcbwds.addr+0xa0 and x$kcbwds.addr+0x1a8, for example:
select addr from x$kcbwds where rownum=1;
ADDR
----------------
000000009E49B868
(gdb) x/x 0x9E49B868+0xa0
0x9e49b908: 0x9e4dceb8
(gdb) x/x 0x9E49B868+0x1a8
0x9e49ba10: 0x9e4dcf18
We can confirm, that the addresses above are pointing to the checkpoint queues:
@fcha.sql 0x9e4dceb8
LOC KSMCHPTR KSMCHIDX KSMCHDUR KSMCHCOM KSMCHSIZ KSMCHCLS KSMCHTYP KSMCHPAR
--- ---------------- ---------- ---------- ---------------- ---------- -------- ---------- ----------------
SGA 000000009E4DCEA8 1 1 Checkpoint queu 48 perm 0 000000009E0CD000
@fcha.sql 0x9e4dcf18
LOC KSMCHPTR KSMCHIDX KSMCHDUR KSMCHCOM KSMCHSIZ KSMCHCLS KSMCHTYP KSMCHPAR
--- ---------------- ---------- ---------- ---------------- ---------- -------- ---------- ----------------
SGA 000000009E4DCF08 1 1 Checkpoint queu 48 perm 0 000000009E0CD000
The MTTR advisor code collects the values which are at 0x10 offset and sums them.
(gdb) x/d 0x9e4dceb8+0x10
0x9e4dcec8: 0
(gdb) x/d 0x9e4dcf18+0x10
0x9e4dcf28: 0
Jonathan pointed out that those values are probably the number of blocks in the checkpoint queue. In the example above, there aren’t any blocks on the observed checkpoint queues.
The MTTR advisor code is only executed after a block was linked to the checkpoint queue. There are also kcbklbc executions that don’t link any blocks and, therefore, hold the redo copy latch only for a very short time. Such cases are recognized by the absence of checkpoint queue latch latch allocations.
Conclusion
In conclusion, DMLs bypassing private strands can piggyback data collection for the MTTR advisor. That code is executed immediately after linking a changed block to the checkpoint queue. The function iterates over the working data set array while holding a redo copy latch. This isn’t a big problem on a database containing a small number of working data sets. But, since the number of working data sets is proportional to the number of CPUs, logon storms can cause serious latch wait cascades on large servers. It’s something to keep in mind when migrating to a bigger server or configuring hyper-threading, especially when the application doesn’t use static connection pools.
References
- [1] Brendan Gregg, Jim Mauro, DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD. 2011.
- [2] Jonathan Lewis, Oracle Core: Essential Internals for DBAs and Developers (Expert’s Voice in Databases). 2012.
Update on October 2, 2020
The MTTR advisor code is the root cause for the following bug:
Bug 29924088 : 18.4 – HIGH CONCURRENCY FROM UPDATE USER$ – SQL_ID 9ZG9QD9BM4SPU