Tracking Arguments with gdb

DTrace

Being able to track function arguments is essential for in-depth troubleshooting and researching software internals. DTrace makes this extremely easy, because all of the arguments are exposed through the variables arg0, arg1, arg2, etc. in the function entry probe.

I’m going to use the following C program from [Yurichev 2013] for demonstration purposes:

#include <stdio.h>

void f1(int a1, int a2, int a3, int a4, int a5, int a6, int a7)
{
  printf ("%d %d %d %d %d %d %d\n", a1, a2, a3, a4, a5, a6, a7);
};

int main()
{
  f1(1,2,3,4,5,6,7);
};

I compiled it with gcc:

gcc f1.c -o f1

As you can see, capturing arguments is really trivial with DTrace:

sudo -u root /usr/sbin/dtrace -qn 'pid$target::f1:entry{ printf("%d %d %d %d %d %d %d \n",arg0,arg1,arg2,arg3,arg4,arg5,arg6);}' -p PID
1 2 3 4 5 6 7 

gdb is an alternative where DTrace isn’t available. However, the complexity increases when the debugging information is stripped, which is rather common with proprietary software. To show how to track arguments under such circumstances, I’m going to disassemble the compiled C program and observe how the arguments get passed around. Dennis Yurichev explained this reverse engineering technique in [Yurichev 2013].

Integers

Generally, the parameters can be passed in two ways: via CPU registers and stack. Which registers are used is precisely defined by calling conventions specific to CPU architecture and OS. The x86 calling conventions are, for instance, described in [Wikipedia].The integers are passed via rdi, rsi, rdx, rcx, r8 and r9 registers on Linux and Solaris. Since only 6 parameters can be passed through these registers, the rest must be passed via stack. Disassembled main function indeed shows how these first 6 argument values are moved to the registers prior to calling the f1 function:

0x0000000000400fc4 <+0>:     push   %rbp
0x0000000000400fc5 <+1>:     mov    %rsp,%rbp
0x0000000000400fc8 <+4>:     sub    $0x8,%rsp
0x0000000000400fcc <+8>:     pushq  $0x7
0x0000000000400fce <+10>:    mov    $0x6,%r9d
0x0000000000400fd4 <+16>:    mov    $0x5,%r8d
0x0000000000400fda <+22>:    mov    $0x4,%ecx
0x0000000000400fdf <+27>:    mov    $0x3,%edx
0x0000000000400fe4 <+32>:    mov    $0x2,%esi
0x0000000000400fe9 <+37>:    mov    $0x1,%edi
0x0000000000400fee <+42>:    callq  0x400f72 
0x0000000000400ff3 <+47>:    add    $0x10,%rsp
0x0000000000400ff7 <+51>:    mov    $0x0,%eax
0x0000000000400ffc <+56>:    leaveq 
0x0000000000400ffd <+57>:    retq   

By the way, I’m using Franck Pachot’s one-liner for disassembling functions provided in [Pachot 2017]:

gdb f1 <<< "disas main"

Since the integer arguments fit into 32 bits they are passed via edi, esi, edx, ecx, r8d and r9d registers. All of those 32-bit registers exist for efficiently accessing the least significant 32 bits of the 64-bit registers rdi, rsi, rdx, rcx, r8 and r9, respectively.

Thus, we can display the first 6 integer arguments by printing the register values when the function f1 is entered:

b f1 
c
Thread 2 hit Breakpoint 1, 0x0000000000400f76 in f1 ()

printf "%d, %d, %d, %d, %d, %d \n", $edi, $esi, $edx, $ecx, $r8d, $r9d 
1, 2, 3, 4, 5, 6 

Stack

The 7th argument was pushed onto the stack prior to calling f1:

0x0000000000400fcc <+8>:     pushq  $0x7

Before retrieving the argument from the stack within f1, we have to know exactly how many values are on the stack in front of the argument.

But first, let’s take a closer look at the addresses. Below is the disassembled f1 function:

gdb f1 <<< "disas f1"

0x0000000000400f72 <+0>:     push   %rbp
0x0000000000400f73 <+1>:     mov    %rsp,%rbp
0x0000000000400f76 <+4>:     sub    $0x20,%rsp
0x0000000000400f7a <+8>:     mov    %edi,%eax
...

It’s worth noting that the function begins at the address 0x400f72, but the breakpoint was set a couple of lines farther – at the address 0x400f76:

info b
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x0000000000400f76 <f1+4>
</f1+4>

The code fragment inbetween is the function prologue, which first saves the caller’s frame pointer (stored in rbp) on the stack and then creates callee’s frame by copying rbp into the stack pointer register rsp. Gustavo Duarte described this mechanism in detail in [Duarte 2014]. Consequently, argument’s position on the stack relative to the stack pointer rsp will change after the prologue execution.

The following picture illustrates the stack before and after the prologue execution:

When retrieving the argument, we have to exactly know where the program stopped. In our example gdb detected the prologue and set the breakpoint at its end. But this mechanism isn’t reliable, because gdb doesn’t always recognize the prologue. This can happen, for example, when some compiler optimizations are employed. Fortunately, we can enforce a break point at the function entry, i.e. before the prologue, by specifying * before the function name:

b *f1
Breakpoint 1 at 0x400f72

Now the 7th argument can be safely retrieved from the second location on the stack:

x/d $rsp+8
0x7fffbffff930: 7

Floating point

The floating point arguments are passed via xmm0 – xmm7 registers.

We can easily demonstrate this by converting the first argument to floating point:

#include <stdio.h>

void f1(float a1, int a2, int a3, int a4, int a5, int a6, int a7)
{
  printf ("%f %d %d %d %d %d %d\n", a1, a2, a3, a4, a5, a6, a7);
};

int main()
{
  f1(1,2,3,4,5,6,7);
};

The floating point value is stored as a constant on the location 0x400d64 and loaded into the xmm0 register.


0x0000000000400fce <+0>:     push   %rbp
0x0000000000400fcf <+1>:     mov    %rsp,%rbp
0x0000000000400fd2 <+4>:     mov    $0x7,%r9d
0x0000000000400fd8 <+10>:    mov    $0x6,%r8d
0x0000000000400fde <+16>:    mov    $0x5,%ecx
0x0000000000400fe3 <+21>:    mov    $0x4,%edx
0x0000000000400fe8 <+26>:    mov    $0x3,%esi
0x0000000000400fed <+31>:    mov    $0x2,%edi
0x0000000000400ff2 <+36>:    movss  -0x296(%rip),%xmm0        # 0x400d64
0x0000000000400ffa <+44>:    callq  0x400f72 
0x0000000000400fff <+49>:    mov    $0x0,%eax
0x0000000000401004 <+54>:    pop    %rbp
0x0000000000401005 <+55>:    retq   
   
x/f 0x400d64
0x400d64:       1

Expectedly, there aren’t any parameters passed via stack anymore as the other 6 parameters fit into the r* registers.

The value of the first argument can be retrieved on the f1 entry point as follows:

p $xmm0
$1 = {v4_float = {1, 0, 0, 0}, v2_double = {5.2635442471208903e-315, 0}, v16_int8 = {0, 0, -128, 63, 0 }, v8_int16 = {0, 16256, 0, 0, 0, 0, 0, 0}, v4_int32 = {1065353216, 0, 0, 0}, 
  v2_int64 = {1065353216, 0}, uint128 = 1065353216}

Pointers

To demonstrate how to retrieve the variables passed by reference, I modified the C program to pass a pointer to a string as the first argument. Additionally, I added the 8th argument – also a pointer to a string – in order to show how to dereference pointers when passed in both ways: via register and via stack.

#include <stdio.h>
#include <stdint.h>

void f1(char* a1, int a2, int a3, int a4, int a5, int a6, int a7, char* a8)
{
  printf ("%s %d %d %d %d %d %d %s\n", a1, a2, a3, a4, a5, a6, a7, a8);
};

int main()
{
  uint64_t dummy = 1 ;
  char *str1 = "String 1";
  char *str2 = "String 8";
  f1(str1,2,3,4,5,6,7,str2);
};

We can see that for pointers the whole 64-bit rdi register is being used – as opposed to the least significant 32 bits referenced via edi when dealing with integers.

0x0000000000400fde <+0>:     push   %rbp
0x0000000000400fdf <+1>:     mov    %rsp,%rbp
0x0000000000400fe2 <+4>:     sub    $0x10,%rsp
0x0000000000400fe6 <+8>:     movq   $0x400d65,-0x8(%rbp)
0x0000000000400fee <+16>:    movq   $0x400d6e,-0x10(%rbp)
0x0000000000400ff6 <+24>:    mov    -0x8(%rbp),%rax
0x0000000000400ffa <+28>:    pushq  -0x10(%rbp)
0x0000000000400ffd <+31>:    pushq  $0x7
0x0000000000400fff <+33>:    mov    $0x6,%r9d
0x0000000000401005 <+39>:    mov    $0x5,%r8d
0x000000000040100b <+45>:    mov    $0x4,%ecx
0x0000000000401010 <+50>:    mov    $0x3,%edx
0x0000000000401015 <+55>:    mov    $0x2,%esi
0x000000000040101a <+60>:    mov    %rax,%rdi
0x000000000040101d <+63>:    callq  0x400f82 
0x0000000000401022 <+68>:    add    $0x10,%rsp
0x0000000000401026 <+72>:    mov    $0x0,%eax
0x000000000040102b <+77>:    leaveq 
0x000000000040102c <+78>:    retq   

Both string constants are stored at fix addresses:

(gdb) x/s 0x400d65
0x400d65:       "String 1"
(gdb) x/s 0x400d6e
0x400d6e:       "String 8"

First, I’ll run the program and stop the execution after entering f1:

b *f1
Breakpoint 1 at 0x400f82: file f1.c, line 5.
r

The pointer to the first argument is passed via the rdi register, which we can simply dereference as follows:

x/s $rdi
0x400d65:       "String 1"

The pointer to the string passed as the 8th argument, which is the second argument passed via stack, is stored on the third stack location. Therefore, it can be dereferenced as follows:

x/s *(uint64_t *)($rsp+8+8)
0x400d6e:       "String 8"

A side note: I had to include stdint.h header file and declare a uint64_t variable, just to be able to use uint64_t type when dereferencing the pointer in gdb. For the same reason, I additionally compiled the program with the -g option to include the symbol table:

gcc -g f1.c -o f1

Tracing Oracle memory allocations

Finally, it’s time to do something useful, like, for example, trace Oracle memory allocations. These are performed within the Oracle C functions kghalf, kghalo and kghalp.

Tanel Poder and Stefan Koehler have already written DTrace scripts for tracing their execution: trace_kghal.sh and dtrace_kghal_pga_code, respectively.

By looking into their DTrace scripts, we can identify the following information of interest:

  • heap name, whose pointer is stored in the location arg2+76,
  • allocation reason, whose pointer is passed via arg6 in kghalf and kghalp, and via arg9 in kghalo.

Knowing that, it’s fairly easy to come up with the gdb commands:

b *kghalf
commands 1
silent
printf "kghalf: %s - %s \n", (char *)($rsi+76), (char *)($r9)
continue
end

b *kghalp
commands 2
silent
printf "kghalp: %s - %s \n", (char *)($rsi+76), (char *)($r9)
continue
end

Notice that kghalo recieves the allocation reason via arg9 which is, therefore, passed via stack:

b *kghalo
commands 3
silent
printf "kghalo: %s - %s \n", (char *)($rsi+76), (char *)(*(uint64_t *)($rsp+8+8+8))
continue
end

Optionally, you can set the following parameters for spooling the output into a file:

set pagination off
set logging file kgh_allocations.log
set logging on

attach PID
c

set logging off

The sample output looks as follows:

...
kghalo: SQLA^bf668378 - qbcqtcHTHeap 
kghalp: 13910.kgght - 13910.kgght 
kghalo: qbcqtcHTHeap - 13910.kgght 
kghalp: 13910.kgght - 13910.kgght 
kghalo: qbcqtcHTHeap - 13910.kgght 
kghalo: SQLA^bf668378 - qbcqtcHTHeap 
kghalo: sga heap - SQLA^bf668378 
kghalf: qbcqtcHTHeap - 613.kggec 
kghalo: SQLA^bf668378 - qbcqtcHTHeap 
kghalp: 613.kggec - 613.kggec 
kghalo: qbcqtcHTHeap - 613.kggec 
kghalp: 613.kggec - 613.kggec 
kghalo: qbcqtcHTHeap - 613.kggec 
kghalo: SQLA^bf668378 - qbcqtcHTHeap 
kghalo: sga heap - SQLA^bf668378 
kghalp: SQLA^bf668378 - qcpifqtqc : qcsidn 
kghalf: SQLA^bf668378 - qcpifqtqc : qcsidn 
kghalp: SQLA^bf668378 - frodef:qcpitnm 
kghalf: SQLA^bf668378 - frodef:qcpitnm 
kghalp: SQLA^bf668378 - idndef : qcuAllocIdn 
kghalf: SQLA^bf668378 - idndef : qcuAllocIdn 
kghalp: SQLA^bf668378 - idndef : qcuAllocIdn 
kghalf: SQLA^bf668378 - idndef : qcuAllocIdn 
kghalp: SQLA^bf668378 - chedef : qcuatc 
kghalf: SQLA^bf668378 - chedef : qcuatc 
kghalp: SQLA^bf668378 - qbpdef: qekbCreateQbp 
kghalf: SQLA^bf668378 - qbpdef: qekbCreateQbp 
kghalp: kxs-heap-c - opsdef: qcpipsh1 
kghalp: kxs-heap-c - chedef : qcuatc 
kghalp: KGLH0^bf668378 - kgltbtab2 
kghalo: KGLH0^bf668378 - kgltbtab 
...

References

  • [Yurichev 2013] Dennis Yurichev. (2013). Reverse Engineering for Beginners (Understanding Assembly Language)
  • [Wikipedia] Wikipedia. X86 calling conventions
  • [Pachot 2017] Franck Pachot. (January 5, 2017). 12cR2: no cardinality feedback for small queries
  • [Duarte 2014] Gustavo Duarte. (March 10, 2014). Journey to the Stack, Part I
Thanks for sharing

Nenad Noveljic

One Comment

  1. Thanks a lot Nenad Noveljic. This was really helpful to me. Appreciate your efforts very much thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.