Tracking Wait Events with DTrace

Wait Interface Internals

This blog post describes the DTrace script I came up with while analyzing wrong data in an Oracle SQL Extended SQL Trace file. In particular, the purpose of the script mentioned is to calculate the wait event durations by tracing Oracle database functions and crosschecking the result with the trace data. Although the script is still experimental, it has been producing correct results on my test cases so far. I’ve been using it on 12.2.0.1.180116 databases and Solaris 11.3.

First of all, let me mention that the script is entirely based on the information presented in Stefan Koehler’s blog post Understanding the Oracle code instrumentation (wait interface) – A deep dive into what is really measured.

Before jumping to the DTrace script, I’d like summarize (and partially paraphrase) the most relevant information in Stefan’s article.

1. Oracle function kslwtbctx represents the start of a wait event.
2. Oracle function kslwtbctx calls the function kskthbwt to get/store the corresponding wait event (number).
3. Oracle function kslwtectx represents the end of a wait event.
4. The Solaris kernel function get_cur_hrtime returns the current time in nanoseconds relative to some arbitrary point in the past.
5. The Solaris kernel function get_cur_hrtime gets called before and after the wait event. Subsequently, the difference between two captured values is reported as the wait event elapsed time.

DTrace Script

1  #!/usr/sbin/dtrace
2  
3  #pragma D option quiet
4  
5  pid$target::kslwtbctx:entry
6  {
7    self->start_event = 1 ;
8  }
9  
10 pid$target::kslwtectx:entry
11 {
12   self->end_event = 1 ;
13 }
14 
15 pid$target::kskthbwt:entry
16 {
17   self->event_code = arg2 ;
18 }
19 
20 pid$target::get_cur_hrtime:return
21 / self->start_event /
22 {
23   self->start_ts = timestamp ;
24   self->start_event = 0 ;
25 }
26 
27 pid$target::get_cur_hrtime:return
28 / self->end_event /
29 {
30   printf("duration: %d ns\n", arg1 - self->start_ts);
31   printf("event code: %d\n", self->event_code );
32   self->end_event = 0 ;
33   self->start_ts = 0 ;
34 }
35 
36 pid$target::write:entry
37 /
38   strstr(fds[arg0].fi_pathname,".trc") != 0 && \
39   strstr(copyinstr(arg1),"WAIT") != 0
40 /
41 {
42   printf("%s",copyinstr(arg1));
43   printf("------------------------\n");
44 }

The variable self->start_ts stores the timestamp in nanosecondes returned by the Solaris kernel function get_cur_hrtime (line 23). Since the function get_cur_hrtime gets called in different places in the code, conditions on the lines 21 and 28 filter the executions made in the wait event start and end context, respectively. Later on, the duration is callculated and printed on the line 30. Besides that, the variable self->event_code on the line 17 stores the wait event code, which is then printed on the line 31. Last but not least, the Oracle SQL Extended Trace information is printed on the line 42.

Output

Here’s a sample output:


...
------------------------
duration: 4786884 ns
event code: 228
WAIT #18446604434613714256: nam='direct path write' ela= 4787 file number=1 first dba=453248 block cnt=128 obj#=74 tim=17223120819779
------------------------
...
------------------------
duration: 581623 ns
event code: 164
WAIT #18446604434566192160: nam='db file sequential read' ela= 582 file#=1 block#=120860 blocks=1 obj#=423 tim=17223125714073
------------------------
...

“Duration” (in nanoseconds) calculated by the script corresponds to the “ela” (in microseconds) in the SQL trace. The “event code” captured in the Oracle function kskthbwt refers to the name of the wait event (“nam”) in the trace file.

Finally, their relationship can be queried as follows:

 select EVENT#,NAME from V$EVENT_NAME where NAME in
  ('db file sequential read','direct path write');
    EVENT# NAME
---------- ----------------------------------------------------------------
       164 db file sequential read
       228 direct path write
Thanks for sharing

Nenad Noveljic

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.