Negative Unaccounted-for (2) – Pitfalls

In my previous blog post I described boundary conditions that cause negative unaccounted-for time. The main point is – the CPU time consumed by the code instrumented with a wait event is double counted. To begin with, it’s contained in the CPU time of its parent database call. And then, it’s entailed in the wait event duration. The negative duration of this CPU time is unaccounted-for.

Although I’ve been was focusing only on SQL extended trace so far, there are more performance diagnostic features where you shouldn’t take the reported elapsed and CPU times at face value. In fact, I’d like to make you aware of a few pitfalls related to the negative unaccounted-for in SQL Monitor and ASH.

In doing so, I’m using synchronous IO on ZFS for demonstration purposes. On top of that, I’ve now configured the file system cache, that is ZFS ARC, to make pread completely CPU-bound. The database release is 12.2.0.1.180116 on Solaris 11.3.

tkprof

As you can see in the tkprof output below, the sum of the CPU time (19.34s) and wait events duration (13.26s) exceeds the elapsed time (19.34s).


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.02       0.02          0         63          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1     19.34      19.34    3408598    3408626          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3     19.36      19.36    3408598    3408689          0           1

...

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  SQL*Net message from client                     1        0.00          0.00
  PGA memory operation                            2        0.00          0.00
  direct path read                            26669        0.00         13.26
********************************************************************************

unaccounted-for can be calculated as follows:

unaccounted-for = elapsed – cpu – total_waited = 19.36s – 19.36s – 13.26s = -13.26s

In other words, the whole execution (19.34s) was on the CPU. Besides that, it took 13.26 s (entirely on the CPU) to execute the code instrumented by the wait event “direct path read”, which was double counted.

In conclusion, in the case of a non-zero unaccounted-for, it’s wrong to think of elapsed time as the sum of the CPU time and wait event durations.

By the way, sophisticated profilers, such as Method-R Profiler, report the unaccounted-for.

SQL Monitor

Below is the excerpt from the corresponding SQL Monitor report:

...
Global Information
------------------------------
...
Duration            :  20s
...
Global Stats
=================================================================
| Elapsed |   Cpu   |    IO    | Fetch | Buffer | Read  | Read  |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  | Reqs  | Bytes |
=================================================================
|      33 |      19 |       13 |     1 |     3M | 26669 |  26GB |
=================================================================
...

Obviously, “Elapsed Time(s)” in the “Global Stats” section is calculated as the sum of “CPU Time(s)” and “IO Wait(s)” and therefore misleading. Use “Duration” in the “Global Information” instead.

ASH

With the following query we’ll verify how the samples are categorized in v$active_session_history:

column event format a18                                                                                                               
select session_state,event, count(*) 
  from v$active_session_history 
  where sql_id = '4hqv20rbt8dck' and
    sql_exec_id = ( 
     select max(sql_exec_id) from v$active_session_history 
       where sql_id = '4hqv20rbt8dck'
   )
  group by event,session_state ;                                                                                                       

SESSION EVENT                COUNT(*)
------- ------------------ ----------
WAITING direct path read           14
ON CPU                              5

Expectedly, there isn’t double counting, as there’s just one sample per second whose session state can be either WAITING or ON CPU. It’s worth noting that the samples taken during the execution of the code instrumented by a wait event were categorized as WAITING, whether the process was on the CPU or not.
Consequently, the CPU time based on v$active_session_history can’t be calculated correctly in the case of a negative unaccounted-for.

Thanks for sharing

Nenad Noveljic

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.