In my previous blog post I described boundary conditions that cause negative unaccounted-for time. The main point is – the CPU time consumed by the code instrumented with a wait event is double counted. To begin with, it’s contained in the CPU time of its parent database call. And then, it’s entailed in the wait event duration. The negative duration of this CPU time is unaccounted-for.
Although I’ve been was focusing only on SQL extended trace so far, there are more performance diagnostic features where you shouldn’t take the reported elapsed and CPU times at face value. In fact, I’d like to make you aware of a few pitfalls related to the negative unaccounted-for in SQL Monitor and ASH.
In doing so, I’m using synchronous IO on ZFS for demonstration purposes. On top of that, I’ve now configured the file system cache, that is ZFS ARC, to make pread completely CPU-bound. The database release is 12.2.0.1.180116 on Solaris 11.3.
tkprof
As you can see in the tkprof output below, the sum of the CPU time (19.34s) and wait events duration (13.26s) exceeds the elapsed time (19.34s).
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.02 0.02 0 63 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 19.34 19.34 3408598 3408626 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 19.36 19.36 3408598 3408689 0 1
...
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
SQL*Net message from client 1 0.00 0.00
PGA memory operation 2 0.00 0.00
direct path read 26669 0.00 13.26
********************************************************************************
unaccounted-for can be calculated as follows:
unaccounted-for = elapsed – cpu – total_waited = 19.36s – 19.36s – 13.26s = -13.26s
In other words, the whole execution (19.34s) was on the CPU. Besides that, it took 13.26 s (entirely on the CPU) to execute the code instrumented by the wait event “direct path read”, which was double counted.
In conclusion, in the case of a non-zero unaccounted-for, it’s wrong to think of elapsed time as the sum of the CPU time and wait event durations.
By the way, sophisticated profilers, such as Method-R Profiler, report the unaccounted-for.
SQL Monitor
Below is the excerpt from the corresponding SQL Monitor report:
...
Global Information
------------------------------
...
Duration : 20s
...
Global Stats
=================================================================
| Elapsed | Cpu | IO | Fetch | Buffer | Read | Read |
| Time(s) | Time(s) | Waits(s) | Calls | Gets | Reqs | Bytes |
=================================================================
| 33 | 19 | 13 | 1 | 3M | 26669 | 26GB |
=================================================================
...
Obviously, “Elapsed Time(s)” in the “Global Stats” section is calculated as the sum of “CPU Time(s)” and “IO Wait(s)” and therefore misleading. Use “Duration” in the “Global Information” instead.
ASH
With the following query we’ll verify how the samples are categorized in v$active_session_history:
column event format a18
select session_state,event, count(*)
from v$active_session_history
where sql_id = '4hqv20rbt8dck' and
sql_exec_id = (
select max(sql_exec_id) from v$active_session_history
where sql_id = '4hqv20rbt8dck'
)
group by event,session_state ;
SESSION EVENT COUNT(*)
------- ------------------ ----------
WAITING direct path read 14
ON CPU 5
Expectedly, there isn’t double counting, as there’s just one sample per second whose session state can be either WAITING or ON CPU. It’s worth noting that the samples taken during the execution of the code instrumented by a wait event were categorized as WAITING, whether the process was on the CPU or not.
Consequently, the CPU time based on v$active_session_history can’t be calculated correctly in the case of a negative unaccounted-for.
One Comment