This blog post is inspired by John Beresniewicz’s presentation DB Time, Average Active Sessions, and ASH Math.
In a nutshell, DB time can be estimated with the sum(usecs_per_row) over an observed time period, where usecs_per_row is the distance between the adjacent samples in active session history (ASH). Average active sessions (AAS) can then be calculated as DB Time / elapsed time.
John also wrote two queries for estimating DB time and AAS based on ASH samples and comparing the result with more precise time model statistics: ASHmetricsAAS.sql and ASH_TImeModel_AAS.sql, respectively.
Generally, I’ve got very good results while experimenting with John’s queries. For example, let’s consider a workload where multiple concurrent sessions are executing a SQL in a PL/SQL loop:
for i in {1..30}
do
sqlplus /nolog << EOD &
connect / as sysdba
declare
l integer ;
begin
for i in 1..1e9
loop
select count(*) into l from dba_objects,dba_objects ;
end loop;
end ;
/
EOD
done
We can see that the estimate is perfect:
@ASHmetricsAAS.sql
END_TIME MDBTIME ASHDBTIME ASHCOUNT
------------------- ---------- ---------- ----------
2020-03-20 11:29:49 29.8652114 30 1802
2020-03-20 11:30:48 30.3782081 30 1774
2020-03-20 11:31:48 29.9119818 30 1802
@ASH_TImeModel_AAS.sql
END_TIME DBTIME_PCTDIFF ASH_AAS TIMEMODEL_AAS
------------------- -------------- ---------- -------------
2020-03-20 11:29:49 0 29.9 29.9
2020-03-20 11:30:48 1.6 29.9 30.4
2020-03-20 11:31:48 0 29.9 29.9
(John’s scripts also estimate the CPU time, but I removed this part from the output.)
However, I’ve also noticed some discrepancies, whose root cause could be tracked down to a large number of logon operations.
I’ll demonstrate this by producing a logon storm:
for j in {1..10000}
do
for i in {1..30}
do
sqlplus /nolog << EOD &
connect / as sysdba
EOD
done
done
The ASH based estimate is completely unreliable – only around 1% of the total DB time is recorded in ASH:
@ASHmetricsAAS.sql
END_TIME MDBTIME ASHDBTIME ASHCOUNT
------------------- ---------- ---------- ----------
2020-03-20 11:38:54 95.5325744 .916666667 57
2020-03-20 11:39:53 70.7569325 .440677966 31
2020-03-20 11:40:54 57.5036575 .803278689 54
@ASH_TImeModel_AAS.sql
END_TIME DBTIME_PCTDIFF ASH_AAS TIMEMODEL_AAS
------------------- -------------- ---------- -------------
2020-03-20 11:38:54 99.1 .9 95.5
2020-03-20 11:39:53 99.4 .4 70.8
2020-03-20 11:40:54 98.6 .8 57.5
Where does this discrepancy come from?
Most of the time is consumed while the session hasn’t been created yet. This time will be added to the total DB time, but as the entry in v$session exists only for a tiny fraction of the total logon time, there will be only very few ASH samples captured.
I found a way to measure the exact fraction of the time during which the session exists and is active.
First, I’m stopping the dedicated server process immediately after forking. I’m using the stop_forked.d script running in the background. I described this script in my blog post Debugging Session Creation.
/* stop_forked.d */
#pragma D option quiet
proc:::exec
/ execname == "sqlplus" /
{
self->parent = pid ;
}
proc:::exec-success
/ self->parent &&
strtok(strtok(curpsinfo->pr_psargs," "),"oracle") == $1 /
{
printf("DB: %s, pid: %d \n",
strtok(strtok(curpsinfo->pr_psargs," "),"oracle"),
pid
);
stop();
}
The script writes out the ID of the just forked dedicated server process.
sudo -u root /usr/sbin/dtrace -w -s stop_forked.d '"DB1"'
DB: DB1, pid: 2083
The dedicated server process will resume after starting the script measure_login_time.d.
/* measure_login_time.d */
#pragma D option quiet
pid$target::main:entry
{
self->started = timestamp;
}
pid$target::kzsini1:entry
{
self->session_started = timestamp;
}
pid$target::kpoauth:return
{
this->completed = timestamp;
printf("Total time [ns]: %d\n", this->completed - self->started);
printf("Session time [ns]: %d\n", this->completed - self->session_started);
}
sudo -u root /usr/sbin/dtrace -s measure_login_time.d -p 2083
The script above measures two elapsed times:
- Total logon time, which is the elapsed time between entering main and kzsini1. kzsini1 is the Oracle C function which initializes the session and creates the entry in v$session.
- Session time is the time during which the session exists and is marked as active. It is measured as the elapsed time between entering main and exiting kpoauth. kpoauth is the Oracle C function that handles authentication.
A sample output is below:
Total time [ns]: 337947071
Session time [ns]: 4764305
As you can see, the total time of the session is 1.4%, which matches the output of the ASH_TImeModel_AAS.sql script and explains the difference between the time model and ASH.
In conclusion, a large positive difference between the actual DB time and its ASH estimate might indicate that a substantial amount of processing time is spent on logons, due to, for example, ill-configured connection pools, or even lack thereof.
This discrepancy isn’t an anomaly – it’s how ASH is designed to work, as it records just the active sessions. Therefore, it’s good to be aware of the activities that are contributing to the total DB time, but don’t result in ASH samples.