Active Session History

I was almost fooled by missing data while doing a performance analysis based on active session history (ASH).

The application reported a problem on an Oracle 19c database: A PL/SQL procedure call took an hour and a half instead of 5 minutes to complete.

The active session history looked strange:

@dashtop.sql top_level_sql_id,sql_exec_id "session_id = 997 and session_serial# = 58505" "timestamp '2021-02-26 05:36:37'" "timestamp '2021-02-26 08:03:09'"
    Total
  Seconds     AAS %This   TOP_LEVEL_SQL SQL_EXEC_ID FIRST_SEEN          LAST_SEEN
--------- ------- ------- ------------- ----------- ------------------- -------------------
      290      .0   97%   289q4500mqm43    16777216 2021-02-26 05:36:40 2021-02-26 08:03:06

SQL> @https://github.com/tanelpoder/tpt-oracle/blob/master/awr/awr_sqlid.sql 289q4500mqm43

      DBID SQL_ID        SQL_TEXT                                                                                                                                                                                                 COMMAND_TYPE   CON_DBID     CON_ID
---------- ------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------ ---------- ----------
 771238284 289q4500mqm43 BEGIN package.init_procedure; END;                                                                                                                                                    47  771238284          0

If you counted samples for estimating the database time, you could conclude that the session spent only a fraction of the total time in the database (290s out of 87 minutes). But during the whole time period between 05:36:40 and 08:03:06, the session was executing a single PL/SQL procedure call. Consequently, the time must have been spent in the database, but the ASH samples were missing.

Flashback Data Archive

Fortunately, the server dedicated process left some information in its trace. It interacted with the Flashback Data Archive (FBDA) background process:

*** SESSION ID:(997.58505) 2021-02-26T05:36:58.306653+01:00
...
*** 2021-02-26T05:37:20.673061+01:00
...
ktfaTrace: ktfacmtcbk:12071 =Inline Propagate large:19.22.4011802:changes:903, undoblks:17
...
*** 2021-02-26T08:03:08.617529+01:00
ktfaTrace: ktfa_slave_schedule:12387 =FBDA Slave called
ktfaTrace: ktfa_slave_schedule:12390 =FDA slave task scheduled
ktfaTrace: ktfa_slave_schedule:12392 =ktfa_slave_schedule:ktsj_status=0

I inspected the PL/SQL procedure, and identified the suspects: truncate and table statistics gathering.

fbar timer

I setup a test case with FBDA:

create flashback archive fba1 tablespace tools retention 1 year;
create table t (n integer) flashback archive fba1;

First, I suspended the fbda process:

ps -ef | grep fbda | grep DB1
oracle   18082 17800   0   Dec 28 ?           2:23 ora_fbda_DB1

oradebug setospid 18082
oradebug suspend

Then I ran truncate:

truncate table t ;

Truncate was hanging without leaving any samples in ASH:

@ashtop.sql event2 session_id=815 &1min

no rows selected

So I reproduced a variant of this problem.

But what was the process doing?

It was waiting on fbar timer:

column event format a15
column wait_class format a15

select event,wait_class,status,seconds_in_wait from v$session where sid=815 ;

EVENT           WAIT_CLASS      STATUS   SECONDS_IN_WAIT
--------------- --------------- -------- ---------------
fbar timer      Idle            ACTIVE               546

fbar timer isn’t documented, but I guessed its meaning by looking at the stack of the server dedicated process hanging in truncate:

pstack 27109
27109:  oracleDB1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
 00007fffa617741a portfs   (5, 4, 7fffbffe3480, 0, 1dcd6500, 7fffbffe34a0)
 000000000622c44d sskgpwwait () + dd
 000000000622c033 skgpwwait () + c3
 000000000642e664 ksliwat () + 9b4
 000000000642d84f kslwaitctx () + af
 000000000a7a3a58 ktfa_do_prepare_ddl () + 1b8
 00000000069d448c ktfa_check_ddl_fda_tables () + 179c
 0000000006947ce6 opiSem () + 696
 00000000069cbaaf opiprs () + 11f
 00000000069ca3ff kksParseChildCursor () + 1ff

The server dedicated process is waiting on a message (from fbda). Apparently, there’s some communication between the server dedicated process and fbda in the case of DDLs. Consequently, if fbda is doing something else and doesn’t send the message, or the message gets lost, the user session will keep hanging. On top of that, this time isn’t visible in ASH.

Why is fbar timer not visible in ASH?

Because, it’s wrongly classified as “Idle”:

select wait_class from v$event_name where name = 'fbar timer' ;

WAIT_CLASS
----------------------------------------------------------------
Idle

In contrast, Tanel Poder’s Snapper highlights the problem:

@snapper all 10 1 815
Sampling SID 815 with interval 10 seconds, taking 1 snapshots...

-- Session Snapper v4.30 - by Tanel Poder ( http://blog.tanelpoder.com/snapper ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    815, USER1     , STAT, in call idle wait time                                    ,          1001,      99.53,         ,             ,          ,           ,          ~ per execution
    815, USER1     , WAIT, fbar timer                                                ,      20002541,      1.99s,   198.9%, [WWWWWWWWWW],         0,          0,        20s average wait

--  End of Stats snap 1, end=2021-03-15 15:39:47, seconds=10.1

    

--  End of ASH snap 1, end=2021-03-15 15:39:47, seconds=10, samples_taken=92, AAS=0

When truncate completes (after issuing “oradebug resume” in this test case) tkprof will display the wait and elapsed time correctly:

truncate table t


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.82    2757.71          0          0          0           0
Execute      1      0.00       4.50          0          0          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.82    2762.22          0          0          0           0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 88

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  fbar timer                                      6     2757.69       2762.21
  PGA memory operation                            1        0.00          0.00
  SQL*Net message to client                       1        0.00          0.00
  SQL*Net message from client                     1        0.00          0.00
********************************************************************************

Bugs

Below is the list of the open bugs which cause the user session to wait on fbar timer:

Bug 31667263 : STATS HANG WAITING AS FINAL BLOCKER FOR ‘FBAR TIMER’
Bug 31625755 : FBDA DISABLE SLOW WAITING FOR ‘FBAR TIMER’ FOR HUGE NUMBER OF FBDA TABLES
Bug 31031183 : FBDA: HANG IN ‘FBAR TIMER'<=’ENQ: TX – CONTENTION’ WITH HIGH GG EXTRACT LOAD
Bug 32587508 : ORA-1 ERROR WHILE MERGING INTO SYS_FBA_TCRV_% TABLE

Summary

In the case of DDL on a table with a Flashback Data Archive (FBDA), the dedicated server process communicates with the FBDA background process. Waiting on the message is instrumented with the undocumented fbar timer wait event. As this wait event is classified as Idle, it won’t appear in the active session history. The unaccounted database time is a consequence. Therefore, be careful when estimating database time by counting ASH samples when tables with FBDA are involved. SQL trace or Tanel Poder’s Snapper will show the correct information once the issue reappears.

There are several open bugs related to fbar timer. Check the bug descriptions to see if they are relevant for your case.

Thanks for sharing

2 Comments


March 21, 2021
Victor Torres

Never thought about it.. Any idea why would FBA Timer an Idle event?
I have to assume there has to be a reason for that.

Now I will be all paranoid every time I don’t see the info in ASH 😛
- Reply
  
  March 22, 2021
  Nenad Noveljic
  
  It doesn’t seem right to me that fbar timer is classified as an idle event.
  
  I implemented a check that that monitors fbar timer based on v$session to compensate for missing ASH samples.
  
  Additional data collection with SQL trace or snapper would highlight the problem.

fbar timer – Active Session History Instrumentation Gap