Tracking the Optimizer with gdb

lmtdcrel

lmtdcrel is an Oracle database C function that compares two double-precision float-point numbers. The input values are passed in the XMM0 and XMM1 CPU registers. The result, which is returned in the $EAX register, is calculated as follows:

  • 1 if XMM0 > XMM1
  • 0 if XMM0 == XMM1
  • -1 if XMM0 < XMM1

The significands, a.k.a. mantissas, of both numbers in the modified normalized form are rounded to the 7th decimal place prior to comparison.

Since the Oracle C functions aren’t documented, I had to verify a lot of executions in order to confirm that this theory holds over a broad range of values. Therefore, I automated this process with some gdb commands and a Python script.

Automated verification

The following commands at the entry and exit breakpoints, respectively, print the values of the registers involved:

silent
p $xmm0
p $xmm1
continue
end
silent
p/d $eax
continue
end

The output is spooled into the log file:

set pagination off
set logging file lmtdcrel.log
set logging on

In the excerpt from the log file below, 53.161750863128702, stored in the XMM0 register, is compared to 53.186755259284759 which is stored XMM1. In this case, the function returns -1, because XMM0 contains the lower value.

$5717 = {v4_float = {4.6755867, 3.16532612, 0, 0}, v2_double = {53.161750863128702, 0}, v16_int8 = {104, -98, -107, 64, -76, -108, 74, 64, 0, 0, 0, 0, 0, 0, 0, 0}, v8_int16 = {-24984, 16533, -27468, 16458, 0, 0, 0, 0}, v4_int32 = {1083547240, 1078629556, 0, 0}, v2_int64 = {4632678668602547816, 0}, uint128 = 4632678668602547816}
$5718 = {v4_float = {-4.38126602e-24, 3.16552138, 0, 0}, v2_double = {53.186755259284759, 0}, v16_int8 = {-11, 125, -87, -104, -25, -105, 74, 64, 0, 0, 0, 0, 0, 0, 0, 0}, v8_int16 = {32245, -26455, -26649, 16458, 0, 0, 0, 0}, v4_int32 = {-1733722635, 1078630375, 0, 0}, v2_int64 = {4632682187658460661, 0}, uint128 = 4632682187658460661}
$5719 = -1

The Python program lmtdcrel_verifier.py parses the gdb log file, extracts the inputs and return values, simulates the function lmtdcrel and compares the simulation results with the real ones.

Practical application

Unsurprisingly, the function lmtdcrel is extensively used by the cost based optimizer (CBO) for comparing the costs of different access paths. A lot of such decisions are logged into the CBO trace, which is knowingly an invaluable source of information. But the CBO doesn’t trace everything – and gdb might fill this gap. By inspecting the lmtdcrel’s arguments before making a decision we can deduce what’s driving that decision.

In the example below, the function was called by kkqctUpdBestCostStt:

#0  0x0000000017fac680 in lmtdcrel ()
#1  0x0000000017111b7b in kkqctUpdBestCostStt ()
#2  0x00000000171169b4 in kkqctNxtLinearStt ()
#3  0x000000000a0e88b6 in kkqctdrvJPPD ()
#4  0x000000000a0e6f6e in kkqjpdctr ()
#5  0x0000000009e1be76 in qksqbApplyToQbcLoc ()
#6  0x0000000009e1bbc4 in qksqbApplyToQbc ()
#7  0x0000000009e1940c in kkqctdrvTD ()
#8  0x000000000a0e65a4 in kkqjpddrv ()
#9  0x0000000009e11b63 in kkqdrv ()
#10 0x0000000009e0f043 in kkqctdrvIT ()
#11 0x0000000009c306aa in apadrv ()
#12 0x0000000009c2a606 in opitca ()
...

The function kkqctUpdBestCostStt keeps track of the most efficient query block (QB) transformation state. When doing so, it relies on lmtdcrel for comparing costs.

Anyway, we hit the breakpoint on lmtdcrel in the stack above and are now inspecting the register values: XMM0 = 53.161750863128702 and XMM1 = 53.186755259284759.

Then we’re looking for the values in the optimizer trace to figure out what’s being compared. For instance, the value in the XMM1 register 53.1867 is the cost of the non-transformed QB:

JPPD: Starting iteration 1, state space = (2) : (0)
JPPD: Performing join predicate push-down (no transformation phase) from query block SEL$A1EEA789 (#1) to query block SEL$2 (#2)
...
JPPD: Updated best state, Cost = 53.186755

Similarly, 53.161751 is the cost of the transformed QB:

JPPD: Costing transformed query.
...
Final cost for query block SEL$A1EEA789 (#1) - All Rows Plan:
   2235   Best join order: 1
   2236   Cost: 53.161751  Degree: 1  Card: 1.000000  Bytes: 59.000000

Expectedly, the optimizer makes the decision to transform the query block because of the lower cost.

JPPD: Performing join predicate push-down (final phase) from query block SEL$A1EEA789 (#1) to query block SEL$2 (#2)

Admittedly, this is a very simple example, but in general, if we’re not sure which comparison leads to which decision, we can postulate a hypothesis which we can easily verify by manipulating register values (in a lab, of course!) when the program enters lmtdcrel. For instance, in our case we can set XMM0 to a value which is greater than XMM1 and then see what happens:

set $xmm0.uint128=0x0000000000000000405b94b440959e70

By the way, 0x0000000000000000405b94b440959e70 is just the hexadecimal representation of 110.32350172625752:

p $xmm0
$84 = {v4_float = {4.67559052, 3.43095112, 0, 0}, v2_double = {110.32350172625752, 0}, v16_int8 = {112, -98, -107, 64, -76, -108, 91, 64, 0, 0, 0, 0, 0, 0, 0, 0}, v8_int16 = {-24976, 16533, -27468, 16475, 0, 0, 0, 0}, v4_int32 = {1083547248, 1079743668, 0, 0}, v2_int64 = { 4637463743206628976, 0}, uint128 = 4637463743206628976}

Finally, we can find the following line in the CBO trace, which confirms that the transformation wasn’t done:

JPPD: Performing join predicate push-down (no transformation phase) from query block SEL$A1EEA789 (#1) to query block SEL$2 (#2)

In conclusion, observing a low-level Oracle C function, such as lmtdcrel and combining that information with the CBO trace can provide additional insights into the optimizer’s internal functioning.

Thanks for sharing

Nenad Noveljic

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.