Performance Problem with Incremental Statistics Calculation

There’s a performance problem related to the calculation of incremental statistics. I observed it on a 19.7.0.0.200414 database. The problem can become recurring, because synopses are invalidated after the creation of new histograms, which can basically happen whenever statistics are calculated with the column size AUTO.

Repeated SQL execution

I’ll create a table with several partitions, and columns and configure incremental statistics:

drop table t ;

create table t ( 
  n1 integer, n2 integer, n3 integer, n4 integer, n5 integer, n6 integer,
  n7 integer, n8 integer, n9 integer, n10 integer, n11 integer 
) partition by range (n1)
(
  partition t_id_1 values less than (2),
  partition t_id_2 values less than (3),
  partition t_id_3 values less than (4),
  partition t_id_4 values less than (5),
  partition t_id_5 values less than (6),
  partition t_id_6 values less than (7),
  partition t_id_7 values less than (8)
);

exec dbms_stats.set_table_prefs(null,'t','incremental','true') ;

gather_stats calls the following query many times:

exec dbms_stats.gather_table_stats(null, 't');
select executions from v$sql where sql_id = '35c8afbgfm40c' ;

EXECUTIONS
----------
88

SELECT NVL (SUM (SPARE1), 0), COUNT (*)
FROM WRI$_OPTSTAT_SYNOPSIS_HEAD$ H,
(SELECT MAX (ANALYZETIME) MA
FROM WRI$_OPTSTAT_SYNOPSIS_HEAD$
WHERE BO# = :B1) V
WHERE H.ANALYZETIME = V.MA AND H.BO# = :B1;

The query was executed 88 times – once for each column in each partition plus a couple of times more. I figured that out by combining the DBMS_STATS and SQL traces. The following line appears when each column statistics gets calculated:

DBMS_STATS: APPROX_NDV_ALGORITHM chosen: HLL in incremental (recent stats use HLL)

The sql statement above gets executed with each appearance of this line.

wri$_optstat_synopsis_head$.bo# contains the object_id of the table. The query was always executed for the same object. In my opinion, this query should be factored out from the column statistics calculation; gather_stats should, instead, execute it only once for each table and cache the result.

Stale partition

What happens when a partition becomes stale?

select partition_name
  from user_tab_statistics
  where table_name = 'T' and stale_stats = 'YES' ;

PARTITION_NAME
--------------------------------------------------------------------------------
T_ID_2

--don't run this in the production
alter system flush shared_pool ;

exec dbms_stats.gather_table_stats(null, 't');

select executions from v$sql where sql_id = '35c8afbgfm40c' ;

EXECUTIONS
----------
        19

The query, expectedly, still executes once for each column, but just for the partition with stale statistics.

Full table scan

The frequent executions are a problem, when the table WRI$_OPTSTAT_SYNOPSIS_HEAD$ becomes huge. I originally noticed this when the synopses were calculated for a flashback archive table which had 230 columns and around 700 partitions. The query had to be executed more than 161000 times!

The execution plan had two full table scans:

SQL Plan Monitoring Details (Plan Hash Value=2514490667)
=========================================================================================================================================================================
| Id |        Operation        |            Name             |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  | Activity |   Activity Detail    |
|    |                         |                             | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes |   (%)    |     (# samples)      |
=========================================================================================================================================================================
|  0 | SELECT STATEMENT        |                             |         |       |         1 |     +3 |     1 |        1 |      |       |          |                      |
|  1 |   SORT AGGREGATE        |                             |       1 |       |         1 |     +3 |     1 |        1 |      |       |          |                      |
|  2 |    NESTED LOOPS         |                             |      39 | 46233 |         1 |     +3 |     1 |       33 |      |       |          |                      |
|  3 |     VIEW                |                             |       1 | 23113 |         1 |     +3 |     1 |        1 |      |       |          |                      |
|  4 |      SORT AGGREGATE     |                             |       1 |       |         1 |     +3 |     1 |        1 |      |       |          |                      |
|  5 |       TABLE ACCESS FULL | WRI$_OPTSTAT_SYNOPSIS_HEAD$ |    252K | 23113 |         3 |     +1 |     1 |     237K | 3278 |   2GB |    66.67 | direct path read (2) |
|  6 |     TABLE ACCESS FULL   | WRI$_OPTSTAT_SYNOPSIS_HEAD$ |      39 | 23120 |         1 |     +3 |     1 |       33 | 3278 |   2GB |    33.33 | direct path read (1) |
=========================================================================================================================================================================

Notice that the data wasn’t being read from cache. Direct path reads were performed, because the table didn’t qualify as a small table, so the buffer cache was bypassed.

The full table scan was chosen, because the plan was produced on the basis of a popular value. After dropping the histogram on bo# I got a plan with the index:

begin
  dbms_stats.set_table_prefs( 
    'sys','wri$_optstat_synopsis_head' , 'method_opt' , 
    'for all columns size 1'
    );
end ;
/

begin
  dbms_stats.delete_column_stats(
    ownname=>'sys', tabname=>'wri$_optstat_synopsis_head', colname=>'bo#',
    col_stat_type=>'histogram'
  );
end;
/
Plan hash value: 1332769619
 
----------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name                        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                             |     1 |    26 |  2203   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE                  |                             |     1 |    26 |            |          |
|   2 |   NESTED LOOPS                   |                             |     1 |    26 |  2203   (1)| 00:00:01 |
|   3 |    NESTED LOOPS                  |                             |  2206 |    26 |  2203   (1)| 00:00:01 |
|   4 |     VIEW                         |                             |     1 |     9 |  1102   (1)| 00:00:01 |
|   5 |      SORT AGGREGATE              |                             |     1 |    14 |            |          |
|   6 |       TABLE ACCESS BY INDEX ROWID| WRI$_OPTSTAT_SYNOPSIS_HEAD$ |  2206 | 30884 |  1102   (1)| 00:00:01 |
|*  7 |        INDEX RANGE SCAN          | I_WRI$_OPTSTAT_SYNOPHEAD    |  2206 |       |    13   (0)| 00:00:01 |
|*  8 |     INDEX RANGE SCAN             | I_WRI$_OPTSTAT_SYNOPHEAD    |  2206 |       |    12   (0)| 00:00:01 |
|*  9 |    TABLE ACCESS BY INDEX ROWID   | WRI$_OPTSTAT_SYNOPSIS_HEAD$ |     1 |    17 |  1101   (1)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   7 - access("BO#"=TO_NUMBER(:B1))
   8 - access("H"."BO#"=TO_NUMBER(:B1))
   9 - filter("H"."ANALYZETIME"="V"."MA")

The new execution plan reduced the duration by almost 90%:

select plan_hash_value,sum(executions),
  round(sum(elapsed_time)/sum(executions)) avg 
  from v$sql where sql_id='35c8afbgfm40c' group by plan_hash_value ;

PLAN_HASH_VALUE SUM(EXECUTIONS)        AVG
--------------- --------------- ----------
     2514490667           13468    2232185
     1332769619            7108     262313

Known bugs

There are many bug reported related to slow incremental statistics gathering. The following bug note mentions the same sql_id: Bug 29555844 : 12.2 SLOW STATS GATHERING. Unfortunately, this bug isn’t resolved.

Summary

In summary, dbms_stats calls the same SQL too many times, instead just once and cache the result. This might become a problem as the table which stores synopses grows. The problem can become bigger if the optimizer chooses full table scan instead of index range scan. And it becomes huge if the database decides to do direct reads. In this case, we end up with hundreds of thousands executions bypassing buffer cache.

It’s possible to drop the histogram on BO# to get an execution plan with index. This alleviates the problem, but doesn’t resolve it completely – because of many executions, the total elapsed time is still horrendous.

Having said that, introduction of incremental statistics requires a thorough planning, especially for tables with many partitions and columns.

Thanks for sharing

Nenad Noveljic

8 Comments

  1. Thanks Nenad for this article and the previous one in the series. At my current client we have several applications spending a lot of time in their statistics gathering process. One of my next actions (when all the planned applications will finish their 19c upgrade) to look carefully to the way incremental mode, partitioning and histogram work together. And you work has already paved the wxay for my investigation.

    • You’re welcome. I’ve learned a lot from you about optimizer, and I’m glad my articles might be helpful.

  2. “In summary, dbms_stats calls the same SQL too many times, instead just once and cache the result.”

    Would be possible to add a result_cache hint, with a SQL patch, to the SQL called by DBMS_STATS?

    • Not only would that be unsupported, but also incorrect, as the result of the query might change as the program iterates over the columns. The cached result would be obsolete in this case. With “cache the result” I’ve actually meant maintaining the intermediate result in the memory instead of querying the database in each iteration.

  3. If you have a partitioned table and leave your METHOD_OPT as AUTO, you’re asking for stats gather problems, which will then be compounded by using INCREMENTAL.

    All it takes is some random one-off query making the stats gather think there should be a new histogram and you’re in trouble.

    With partitioned tables, I always recommend having explicitly generated histograms (to be fair, I’m not a fan of AUTO on non-partitioned tables either. Oracle has way too many histgrams from Oracle 12 onwards. I don’t care if they are cheap.)

  4. create following index also can help to enhance query response
    Create index sys.OPTSTAT_SYNOPSIS_HEAD_IDX on WRI$_OPTSTAT_SYNOPSIS_HEAD$(BO# ,ANALYZETIME) parallel 8;

    • mohammad tanks so much!!

      that index worked as a woraround in a 150TB database that is not being able to gather incremental stats due to this bug.

      We will apply patch but your index just gave us more days to live.

      Thanks!!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.