Redo copy latch

In the previous blog post I showed how a DML can hold a redo copy latch for a very long time if private strands aren’t used. Since recursive statements during logon bypass private strands, this can cause serious latch wait cascades. In particular, most of this time is spent in the function kcbklbc, which links a changed block to the checkpoint queue.

Unlike redo copy latch, checkpoint queue latch latch is being held very briefly. This indicates that the block is linked very fast, and that the extra time is spent elsewhere. That overhead is the main focus of this article.

Assembly Tracing

With DTrace, we can trace offsets [1]. An offset is the position of the instruction in bytes relative to the function start. As we shall see, we can get useful information by analyzing an offset trace even without disassembling the code. In addition to offsets, the following script traces latch allocations within the function.

#pragma D option dynvarsize=32m
#pragma D option quiet

pid$target:oracle:kcbklbc:0
{
  self->started = timestamp ;
  ustack();
  self->kcbklbc = 1;
}

pid$target:oracle:kslgetl:entry,
pid$target:oracle:kslfre:entry
/ self->kcbklbc /
{
  printf("%s %s %X\n", probefunc, probename, arg0);
}

pid$target:oracle:kcbklbc:
{
  printf("%d offset: %s\n", ( timestamp - self->started ) / 1000,  probename);
}

pid$target:oracle:kcbklbc:return
{
  self->kcbklbc = 0 ;
}

Here are the most relevant parts of a huge trace:

..
1 offset: 0
3 offset: 2
6 offset: 3
...
453 offset: 882
kslgetl entry 9E551EB8  --> acquired checkpoint queue latch latch
462 offset: 887
...
1048 offset: 4ac
kslfre entry 9E551EB8 --> released checkpoint queue latch latch
1057 offset: 4b1
...
1403 offset: d1d
kslfre entry 9C438688 --> released object queue header operation latch
1424 offset: d22
...
1539 offset: f13
1543 offset: f17
1555 offset: f1b
1566 offset: f23
1571 offset: f41
1575 offset: f45
1579 offset: f48
1581 offset: f13
1583 offset: f17
1585 offset: f1b
1588 offset: f23
1590 offset: f41
1592 offset: f45
1594 offset: f48
1596 offset: f13
1598 offset: f17
1600 offset: f1b
1610 offset: f23
1613 offset: f41
1615 offset: f45
1618 offset: f48
...
10012 offset: f13
10015 offset: f17
10017 offset: f1b
10021 offset: f23
10023 offset: f25
10025 offset: f2c
10027 offset: f33
10041 offset: f37
10044 offset: f3a
10053 offset: f3e
10055 offset: f41
10058 offset: f45
10060 offset: f48
10063 offset: f13
10066 offset: f17
10068 offset: f1b
10077 offset: f23
10080 offset: f25
10083 offset: f2c
10086 offset: f33
10095 offset: f37
10098 offset: f3a
10107 offset: f3e
10111 offset: f41
10113 offset: f45
10116 offset: f48
10118 offset: f13
10120 offset: f17
10122 offset: f1b
10125 offset: f23
...
30261 offset: return

The first column is the timing in microseconds relative to the function entry. The cut out parts are marked with “…”.

The latch names can be obtained from the database:

select name,addr from v$latch_children where addr like '%9E551EB8' or addr like '%9C438688';

NAME                                                             ADDR
---------------------------------------------------------------- ----------------
checkpoint queue latch                                           000000009E551EB8
object queue header operation                                    000000009C438688

From the trace above, we can observe following:

The whole traced execution lasted around 30ms.
The most time is spent in the loop that was executed after the last latch was released. (Two iterations are marked with blue.)
None of the latches were acquired in the loop.

Working data sets array

There were 1080 loop iterations:

grep f13 kcbklbc.log | wc -l
1080

The number of the iterations coincides with the undocumented parameter _db_block_lru_latches:

select a.ksppinm name,b.ksppstvl value 
  from sys.x$ksppi a, sys.x$ksppcv b
where a.indx = b.indx and b.ksppstvl = '1080'
;

NAME
--------------------------------------------------------------------------------
VALUE
--------------------------------------------------------------------------------
_db_block_lru_latches
1080

Each LRU latch protects a working data set [2]. There are indeed 1080 working data sets:

select count(*) from x$kcbwds;

  COUNT(*)
----------
      1080

The loop iterates over the working data sets. I doublecheked that by changing the parameter value and verifying that the number of iterations changed accordingly.

So, the more working data sets, the longer redo copy latch is being held. As the number of working data sets is proportional to the number of CPUs, the problem is worse on larger servers.

MTTR advisor

Further, there are two types of iterations: with 7 and 13 instructions. This variation reveals a conditional branching. Three instructions must be for the loop control and three for the branching. This leaves us with just a few instructions doing the actual work – they calculate the sum of values obtained by dereferencing several pointers in the SGA.

The reads in the loop aren’t protected by a latch. This means that consistency isn’t critical. Jonathan Lewis pointed out that the result might be consumed by an advisor. Below are all advisor parameters:

@pd2.sql %advi%
Show all parameters and session values from x$ksppi/x$ksppcv...

       NUM N_HEX NAME                                                     VALUE                          DESCRIPTION
---------- ----- -------------------------------------------------------- ------------------------------ ---------------------------------------------------------------------------------------------------
      1825   721 db_cache_advice                                          OFF                            Buffer cache sizing advisory
      1826   722 _db_cache_advice_sample_factor                           4                              cache advisory sampling factor
      1827   723 _db_cache_advice_max_size_factor                         2                              cache advisory maximum multiple of current size to similate
      1828   724 _db_cache_advice_sanity_check                            FALSE                          cache simulation sanity check
      1829   725 _db_cache_advice_hash_latch_multiple                     16                             cache advisory hash latch multiple
      1830   726 _db_mttr_advice                                          ON                             MTTR advisory
      1833   729 _db_mttr_partitions                                      0                              number of partitions for MTTR advisory
      2690   A82 _simulate_io_wait                                        0                              Simulate I/O wait to test segment advisor
      2851   B23 _compression_advisor                                     0                              Compression advisor
      3245   CAD _library_cache_advice                                    TRUE                           whether KGL advice should be turned on
      3246   CAE _kglsim_maxmem_percent                                   5                              max percentage of shared pool size to be used for KGL advice
      3888   F30 _smm_advice_log_size                                     0                              overwrites default size of the PGA advice workarea history log
      3889   F31 _smm_advice_enabled                                      TRUE                           if TRUE, enable v$pga_advice
      3972   F84 _sta_control                                             0                              SQL Tuning Advisory control parameter
      3973   F85 _enable_automatic_sqltune                                TRUE                           Automatic SQL Tuning Advisory enabled parameter
      4252  109C _partition_advisor_srs_active                            true                           enables sampling based partitioning validation

_db_advise_mttr was the first choice, because I previously discovered that the loop isn’t being executed if fast_start_mttr_target = 0. The program indeed doesn’t iterate through the working data sets after disabling the MTTR advisor (_db_mttr_advice = OFF). Simply put, on a large server with many CPUs redo copy latch waits can be massively reduced by disabling the MTTR advisor.

What data exactly does the MTTR advisor code read?

Checkpont queues

Each working data set has two checkpoint queues. Even though the pointers to them are contained in the working data set structure, they are, sadly, not externalized in x$kcbwds. But we can get them with a debugger by peeking into the following addresses (at least on 19.7): x$kcbwds.addr+0xa0 and x$kcbwds.addr+0x1a8, for example:

select addr from x$kcbwds where rownum=1;

ADDR
----------------
000000009E49B868

(gdb) x/x 0x9E49B868+0xa0
0x9e49b908:       0x9e4dceb8
(gdb) x/x 0x9E49B868+0x1a8
0x9e49ba10:       0x9e4dcf18

We can confirm, that the addresses above are pointing to the checkpoint queues:

@fcha.sql 0x9e4dceb8

LOC KSMCHPTR           KSMCHIDX   KSMCHDUR KSMCHCOM           KSMCHSIZ KSMCHCLS   KSMCHTYP KSMCHPAR
--- ---------------- ---------- ---------- ---------------- ---------- -------- ---------- ----------------
SGA 000000009E4DCEA8          1          1 Checkpoint queu          48 perm              0 000000009E0CD000

@fcha.sql 0x9e4dcf18

LOC KSMCHPTR           KSMCHIDX   KSMCHDUR KSMCHCOM           KSMCHSIZ KSMCHCLS   KSMCHTYP KSMCHPAR
--- ---------------- ---------- ---------- ---------------- ---------- -------- ---------- ----------------
SGA 000000009E4DCF08          1          1 Checkpoint queu          48 perm              0 000000009E0CD000

The MTTR advisor code collects the values which are at 0x10 offset and sums them.

(gdb) x/d 0x9e4dceb8+0x10
0x9e4dcec8:       0
(gdb) x/d 0x9e4dcf18+0x10
0x9e4dcf28:       0

Jonathan pointed out that those values are probably the number of blocks in the checkpoint queue. In the example above, there aren’t any blocks on the observed checkpoint queues.

The MTTR advisor code is only executed after a block was linked to the checkpoint queue. There are also kcbklbc executions that don’t link any blocks and, therefore, hold the redo copy latch only for a very short time. Such cases are recognized by the absence of checkpoint queue latch latch allocations.

Conclusion

In conclusion, DMLs bypassing private strands can piggyback data collection for the MTTR advisor. That code is executed immediately after linking a changed block to the checkpoint queue. The function iterates over the working data set array while holding a redo copy latch. This isn’t a big problem on a database containing a small number of working data sets. But, since the number of working data sets is proportional to the number of CPUs, logon storms can cause serious latch wait cascades on large servers. It’s something to keep in mind when migrating to a bigger server or configuring hyper-threading, especially when the application doesn’t use static connection pools.

References

[1] Brendan Gregg, Jim Mauro, DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD. 2011.
[2] Jonathan Lewis, Oracle Core: Essential Internals for DBAs and Developers (Expert’s Voice in Databases). 2012.

Update on October 2, 2020

The MTTR advisor code is the root cause for the following bug:
Bug 29924088 : 18.4 – HIGH CONCURRENCY FROM UPDATE USER$ – SQL_ID 9ZG9QD9BM4SPU

Thanks for sharing

Redo Copy Latch Contention 2: MTTR Advisor