Unaccounted-for

In this blog post I’ll be sharing the insights obtained from analyzing a negative unaccounted-for time in an Oracle extended SQL trace profile.

Ideally, the time elapsed in a fetch database call is the sum of the cpu time and the total duration of the wait events during the fetch:

(1) $\begin{equation*} fetch\_elapsed = fetch\_cpu + \sum_{i \in wait\_events}duration_i \end{equation*}$

If that’s not the case, unaccounted-for is the magnitude of the discrepancy:

(2) $\begin{equation*} \begin{split} unaccounted\_for &= fetch\_elapsed - fetch\_cpu - \sum_{i \in wait\_events}duration_i \end{split} \end{equation*}$

In other words, unaccounted-for is the adjustment that needs to be made to the right side of the equation 1 in the case of a mismatch.

Wait Events

Wait event duration (a.k.a time waited) is the time it took to execute a part of Oracle database code instrumented by a wait event. Therefore, the naming “wait event” is somewhat confusing, because it implies that the server process is solely waiting on something. As we shall see later, that’s not necessarily true. For this reason, Cary Millsap prefers to use the term “syscall” instead of “wait” in his excellent book The Method R Guide to Mastering Oracle Trace Data . Cary’s naming convention describes the reality much better.

Anyway, the primary goal of a wait event is to instrument OS kernel functions executed within the Oracle database code. Generally, for any given wait event i the time spent in an OS kernel function consists of the time spent on the CPU and the time spent waiting on something:

(3) $\begin{equation*} wait\_event\_duration_i = cpu_i + waiting_i \end{equation*}$

After combining the equations 1 and 3 we get the following:

(4) $\begin{equation*} \begin{split} unaccounted\_for &= fetch\_elapsed - fetch\_cpu \\ &- \sum_{i \in wait\_events}(cpu_i + waiting_i) \end{split} \end{equation*}$

Finally, we can apply commutative law to the previous equation 4 to isolate the terms:

(5) $\begin{equation*} \begin{split} unaccounted\_for &= fetch\_elapsed - fetch\_cpu \\ &- \sum_{i \in wait\_events}cpu_i - \sum_{i \in wait\_events}waiting_i \end{split} \end{equation*}$

Usually, the instrumented OS kernel calls, such as IO operations, spend substantially more time waiting on something than consuming CPU. In fact, more often than not the CPU time consumed by such OS kernel functions is close to zero. In such case, the entire wait event duration is the time waited:

(6) $\begin{equation*} \begin{split} cpu_i & \approx 0 \implies \\ wait\_event\_duration_i & \approx waiting_i \\ unaccounted\_for & \approx fetch\_elapsed - fetch\_cpu - \sum_{i \in wait\_events}waiting_i \end{split} \end{equation*}$

CPU Intensive OS Kernel Functions

But what will happen if an instrumented OS kernel function spends a considerable amount of time on the CPU?

First, let’s take a look at the oversimplified pseudocode below to understand how fetch database call is instrumented:

begin_cpu_ts = slcpums

# BEGIN FETCH
loop 
  non-instrumented code
  code instrumented with a wait event
end loop
# END FETCH

end_cpu_ts = slcpums

fetch_cpu = end_cpu_ts - begin_cpu_ts

slcpums is an Oracle database function for getting total CPU time consumed by the process so far. As a consequence, the difference betweeen two consecutive returned values is the total amount of CPU time that the server proces consumed between two slcpums calls. By the way, I provided more details on that topic in my blog post Database Call CPU Instrumentation.

The key point here is that the cpu_time of the instrumented OS kernel function is counted twice in the equation 5. To begin with, it is inherently entailed in the wait event duration. And then, it is contained in the fetch_cpu, simply because the wait events are the children of the fetch call. As a result, the sum of the CPU time and the total time spent in the instrumented code needs to be decreased by the amount of time the instrumented code spent on the CPU in order to match the elapsed time. This negative adjustment is unaccounted-for. Simply put, the amount of time that the code instrumented by a wait event spent on the CPU is a negative unaccounted-for:

(7) $\begin{equation*} unaccounted\_for = -\sum_{i \in wait\_events}cpu_i \end{equation*}$

DTrace

I put together an experiment to prove the theory expressed with the equation 7.

First, I had to find an instrumented OS kernel function which is consuming CPU. pread qualifies when using ZFS. Therefore, the database needed to be configured for synchronous IO during this experiment (disk_asynch_io = FALSE).

Then I had to figure out the methodology for measuring the CPU time in the code instrumented by wait events, as Oracle database doesn’t provide that. So, I wrote a DTrace script for this purpose:

#pragma D option quiet 

pid$target:oracle:kslwt_start_snapshot:entry
{
  self->start_ts_cpu = vtimestamp ;
}

pid$target:oracle:kslwt_end_snapshot:entry
/ self->start_ts_cpu /
{
  @time["cpu by wait events"] = sum( vtimestamp - self->start_ts_cpu ) ;
}

END
{
  normalize(@time, 1000000);
  printa("%s: %@d ms\n",@time);
}

The server process calls Oracle functions kslwt_start_snapshot and kslwt_end_snapshot before and after the instrumented code, respectively, to take the current timestamp. The difference of both captured values is the time elapsed in the wait event. Accordingly, the DTrace script measures the CPU time consumed between these two points. By the way, I wrote more on wait event internals in the blog post Tracking Wait Events with DTrace.

The DTrace script above measured 42.848 s CPU time consumed by the Oracle code instrumented with wait events:

cpu by wait events: 42848 ms

The corresponding tkprof output is below:


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1     53.39     175.76    3408598    3408626          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3     53.39     175.76    3408598    3408626          0           1


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  Disk file operations I/O                       59        0.00          0.00
  direct path read                            26669        0.04        164.69

Next, I’m going to apply the values from the tkprof output above to the equation 2:

unaccounted-for = 175.76 – 53.39 – 164.69 = -42.32 s

As you can see, the DTrace script measured unaccounted-for fairly accurately.

Test Environment

The testing was done on a 12.2.0.1.180116 database running on Solaris 11.3.
The database was configured for synchronous IO ( disk_asynch_io = FALSE ). (Please don’t take that as a recommendation. I used the it here just to get the desired effect).

The following query was executed:

select count(*) from big_table where big_table

Summary

In summary, wait events are being used for instrumenting OS kernel functions which are typically waiting on something, like an IO operation, to complete. However, if the code instrumented with a wait event happens to consume some CPU as well, this time will be counted in both fetch CPU time and wait event elapsed time. Consequently, the sum of the fetch CPU time and the total wait event durations will exceed the fetch elapsed time by the amount of time the instrumented code spent on the CPU. This CPU time, i.e. negative unaccounted-for, is not separately instrumented by the Oracle database, but it can be measured with the DTrace script above.

References

The Method R Guide to Mastering Oracle Trace Data, Cary Millsap

Thanks for sharing

Negative Unaccounted-for