In a previous blog post I explained how to gather call stacks with DTrace every time a string of interest gets written into a trace file. This technique comes in handy when researching propriatery software internals. I’ve been using it often to study optimizer; among others, to identify Oracle C functions participating in various query transformations. Unfortunately, DTrace isn’t always available, so I came up with an alternative approach based on gdb.
Also here, I use the same tactic – printing the output buffer and backtrace in the intercepted OS kernel function write calls.
Almost always, there will be plenty of write calls, so we have to be very selective about when to stop and capture the information. The write function specification helps us narrowing down the focus:
NAME
write, pwrite, writev - write on a file
SYNOPSIS
ssize_t write(int fildes, const void *buf, size_t nbyte);
...
DESCRIPTION
The write() function attempts to write nbyte bytes from the buffer
pointed to by buf, to the file associated with the open file descrip-
tor, fildes.
...
;
The first argument is the file descriptor. It can be retrieved for the traced process with the pfiles utility:
pfiles PID
...
15: S_IFREG mode:0640 dev:275,65861 ino:3334935 uid:1000 gid:1000 size:112333 (110KB)
O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE FD_CLOEXEC
/u00/oracle/orabase/diag/rdbms/db_site1/DB/trace/DB_ora_16204.trc
offset:112333 (110KB)
...
As you can see, the file descriptor for the dedicated server process’ trace file is 15. (I switched the optimizer tracing on prior to executing pfiles.) The second argument is the buffer address which contains string to be written. Since the first two arguments are passed via the CPU registers rdi and rsi, we can set a conditional breakpoint as follows:
b *write if $rdi == 15 && strstr($rsi,"JPPD")
The program execution will stop when the string JPPD (Join Predicate Push-Down) is to be written into the trace file.
The following breakpoint commands print out the buffer content and call stack before continuing with the further execution.
commands 1
silent
x/s $rsi
bt
continue
end
Also, I usually configure resuming on received signal (unwindonsignal on), logging to file as well as disable the interactive mode (pagination off).
attach PID
set unwindonsignal on
set pagination off
set logging file stacks5.log
set logging on
c
set logging off
Find below an example of a stack captured during a JPPD transformation:
0x7fffbfff1990: "JPPD: Checking validity of push-down from query block MAIN (#1) to query block RECEIVER (#2)\n"
#0 0x00007ffc3083d0c0 in write () from /lib/64/libc.so.1
#1 0x000000000b181d82 in sdbgrfuwf_write_file ()
#2 0x000000000b181cea in sdbgrfwf_write_file ()
#3 0x000000000b165ae0 in dbgtfdFileWrite ()
#4 0x000000000b0d1527 in dbgtfdFileAccessCbk ()
#5 0x000000000b163365 in dbgtfWriteRec ()
#6 0x000000001d6a375a in dbgtRecVAWriteDisk ()
#7 0x000000001d699e52 in dbgtGrpBVaList_int ()
#8 0x000000000b37ef1d in dbgtGrpB_int ()
#9 0x0000000019a38fb1 in kkqjpdcvvpd ()
#10 0x000000000a775125 in kkqjpdctr ()
#11 0x000000000a47ba33 in qksqbApplyToQbcLoc ()
#12 0x000000000a47b6e4 in qksqbApplyToQbc ()
#13 0x000000000a478d82 in kkqctdrvTD ()
#14 0x000000000a774544 in kkqjpddrv ()
#15 0x000000000a471250 in kkqdrv ()
#16 0x000000000a46e6a0 in kkqctdrvIT ()
#17 0x000000000a25e5bb in apadrv ()
#18 0x000000000a2583fa in opitca ()
#19 0x000000000a25796c in kksFullTypeCheck ()
#20 0x0000000009d2fbdc in rpiswu2 ()
#21 0x000000000a469bc6 in kksLoadChild ()
#22 0x000000000a24c25d in kxsGetRuntimeLock ()
#23 0x000000000a240ad0 in kksfbc ()
#24 0x000000000a463ee0 in kkspsc0 ()
#25 0x000000000a2ddd17 in kksParseCursor ()
#26 0x000000000a2d9876 in opiosq0 ()
#27 0x000000000a4636d7 in kpooprx ()
#28 0x000000000a393c25 in kpoal8 ()
#29 0x0000000009d08cee in opiodr ()
#30 0x0000000009d120d2 in ttcpip ()
#31 0x0000000009d0e318 in opitsk ()
#32 0x0000000009d0cf64 in opiino ()
#33 0x0000000009d08cee in opiodr ()
#34 0x0000000009d082b2 in opidrv ()
#35 0x0000000009d07f07 in sou2o ()
#36 0x0000000009d07d4e in opimai_real ()
#37 0x0000000009d07866 in ssthrdmain ()
#38 0x0000000009d075d7 in main ()
If you’d like to gather call stacks related to, for instance, the OR-Expansion (ORE), you can just change the breakpoint condition:
cond 1 $rdi == 15 && strstr($rsi,"ORE:")
Here’s an example of a captured call stack:
0x7fffbfff4c00: "ORE: after OR Expansion:******* UNPARSED QUERY IS *******\n"
#0 0x00007ffc3083d0c0 in write () from /lib/64/libc.so.1
#1 0x000000000b181d82 in sdbgrfuwf_write_file ()
#2 0x000000000b181cea in sdbgrfwf_write_file ()
#3 0x000000000b165ae0 in dbgtfdFileWrite ()
#4 0x000000000b0d1527 in dbgtfdFileAccessCbk ()
#5 0x000000000b188891 in dbgtfPutStr ()
#6 0x000000000b18806d in dbktPri ()
#7 0x000000000b2125e3 in ksdwrf ()
#8 0x000000001562296d in msqPrintQuery ()
#9 0x0000000010d4d986 in qksqbDumpAsSQL ()
#10 0x0000000019b48fb3 in kkqoreBuildUA ()
#11 0x0000000019b48231 in kkqoreTransform ()
#12 0x0000000019b40fbf in kkqoregdtr ()
#13 0x000000000a47ba33 in qksqbApplyToQbcLoc ()
#14 0x000000000a47b6e4 in qksqbApplyToQbc ()
#15 0x0000000019b40ecf in kkqoredrv ()
#16 0x000000000a470232 in kkqdrv ()
#17 0x000000000a46e6a0 in kkqctdrvIT ()
#18 0x000000000a25e5bb in apadrv ()
#19 0x000000000a2583fa in opitca ()
#20 0x000000000a46907b in kksLoadChild ()
#21 0x000000000a24c25d in kxsGetRuntimeLock ()
#22 0x000000000a240ad0 in kksfbc ()
#23 0x000000000a463ee0 in kkspsc0 ()
#24 0x000000000a2ddd17 in kksParseCursor ()
#25 0x000000000a2d9876 in opiosq0 ()
#26 0x000000000a4636d7 in kpooprx ()
#27 0x000000000a393c25 in kpoal8 ()
#28 0x0000000009d08cee in opiodr ()
#29 0x0000000009d120d2 in ttcpip ()
#30 0x0000000009d0e318 in opitsk ()
#31 0x0000000009d0cf64 in opiino ()
#32 0x0000000009d08cee in opiodr ()
#33 0x0000000009d082b2 in opidrv ()
#34 0x0000000009d07f07 in sou2o ()
#35 0x0000000009d07d4e in opimai_real ()
#36 0x0000000009d07866 in ssthrdmain ()
#37 0x0000000009d075d7 in main ()
Last but not least, a traced process normally executes many write calls. As a consequence, breakpoint handling will inevitably introduce an enormous overhead. For example, a 10053-traced optimization taking 5s to complete lasted almost 1.5 minutes after setting a breakpoint on write. Therefore, use this technique prudently – that is, only for experimenting on your sandbox databases. Interestingely, tracing with DTrace is much lighter – the same optimization lasted “only” 12s.
By the way, below are the links to case studies where I used gdb for researching Oracle database internals:
Thanks for sharing!! I am looking forward to experimenting with this particular technique.