Stale optimizer statistics

For efficiency, optimizer statistics should be gathered only after becoming stale. Traditionally, “stale” means that a significant fraction of data in a table changed, so that statistics don’t accurately describe data anymore. I was surprised when I saw cases in Oracle 19c when statistics were gathered even when they weren’t marked as “stale”. The problem got worse in 21c.

I’m using use the test case below to demonstrate the point.

All we need is a simple table:

drop table t ;

create table t ( n integer ) ;

insert into t select rownum from dual connect by level <= 100 ;

begin
  dbms_stats.gather_table_stats( 
    ownname => null, tabname => 'T' ) ;
end ;
/

alter session set nls_date_format='dd.mm.yyyy hh24:mi:ss' ;

I’m inserting a new row:

insert into t values (1);

Since the table is monitored, the total number of INSERTS increased in the tracking table MON_MODS_ALL$:

exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO ;

select inserts from sys.mon_mods_all$ m, user_objects o
  where o.object_name = 'T' and o.object_id = m.obj# ;

   INSERTS
----------
         1

Numbers in MON_MODS_ALL$ form the basis for the calculation that marks statistics as “stale”.

Since only 1% of all rows where changed, the statistics aren’t marked as stale:

select last_analyzed, stale_stats from user_tab_statistics 
  where table_name = 'T' ;

LAST_ANALYZED       STALE_S
------------------- -------
12.11.2021 16:12:57 NO

Therefore, gathering of stale schema statistics will not update our table’s statistics:

exec dbms_session.sleep(1) ;

begin
  dbms_stats.gather_schema_stats( 
    ownname => null, options => 'gather stale',
    method_opt => 'FOR ALL COLUMNS SIZE 1' 
    --method_opt => 'FOR ALL COLUMNS SIZE REPEAT' ) ;
end ;
/

select last_analyzed, stale_stats from user_tab_statistics 
  where table_name = 'T' ;

LAST_ANALYZED       STALE_S
------------------- -------
12.11.2021 16:12:57 NO

Histograms

Now we will query the table based on an equality predicate:

select * from t where n = 1 ;

exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO ;

Oracle recorded the usage of the equality predicate:

select c.equality_preds, c.timestamp from sys.col_usage$ c, user_objects o 
  where o.object_name = 'T' and o.object_id = c.obj# ;

EQUALITY_PREDS TIMESTAMP
-------------- -------------------
             1 12.11.2021 16:17:07

The statistics are still stale because we haven’t changed many rows:

LAST_ANALYZED       STALE_S
------------------- -------
12.11.2021 16:12:57 NO

Since we are gathering the optimizer statistics with the options “gather stale”, we don’t expect that GATHER_SCHEMA_STATS collects the statistics for our table:

--column usage timestamp must be greater than analyze time
exec dbms_session.sleep(1) ;

begin
  dbms_stats.gather_schema_stats( 
    ownname => null, options => 'gather stale',
    method_opt => 'FOR ALL COLUMNS SIZE 1' 
    --method_opt => 'FOR ALL COLUMNS SIZE REPEAT' ) ;
end ;
/

But it did:

select last_analyzed, stale_stats from user_tab_statistics 
  where table_name = 'T' ;

LAST_ANALYZED       STALE_S
------------------- -------
12.11.2021 16:20:19 NO

Obviously, the column usage in predicates triggered statistic collection, even though the statistics weren’t stale per se.

As a result, a new histogram was created on a column which queried by a literal value:

select histogram from user_tab_col_statistics where table_name = 'T' ;

HISTOGRAM
---------------
FREQUENCY

This behavior is reproducible in the version 21.4 but not in 19.13. You can run it as a script as well.

Internals

SQL trace on statistics collection revealed what happened.

Tables with stale statistics are selected based on two criteria. One is the well known percentage of changed rows. The other is MON_MODS_ALL$.FLAGS. FLAGS isn’t documented. The set third bit triggers statistics collection.

This bit is set by the following merge executed by DMBS_STATS.GATHER_SCHEMA_STATS in Oracle 21c:

MERGE /*+ dynamic_sampling(4) dynamic_sampling_est_cdn  */ INTO SYS.MON_MODS_ALL$ MM USING ( SELECT OBJ#, SUM(FLGS) FLGS FROM ( SELECT OBJ#, :B2 FLGS FROM ( SELECT TAB.OBJ# FROM (SELECT T.OBJ# OBJ#, T.OBJ# BO#, T.ANALYZETIME FROM SYS.TAB$ T WHERE BITAND(T.FLAGS,16) = 16 UNION ALL SELECT T.OBJ# OBJ#, T.BO# BO#, T.ANALYZETIME FROM SYS.TABPART$ T WHERE BITAND(T.FLAGS,2) = 2 UNION ALL SELECT T.OBJ# OBJ#, T.BO# BO#, T.ANALYZETIME FROM SYS.TABCOMPART$ T WHERE BITAND(T.FLAGS,2) = 2 UNION ALL SELECT T.OBJ# OBJ#, TCP.BO# BO#, T.ANALYZETIME FROM SYS.TABSUBPART$ T, SYS.TABCOMPART$ TCP WHERE BITAND(T.FLAGS,2) = 2 AND T.POBJ# = TCP.OBJ# ) TAB, SYS.OBJ$ O, SYS.USER$ U WHERE TAB.BO# = O.OBJ# AND TAB.ANALYZETIME < O.MTIME AND O.OWNER# = U.USER# AND ((:B1 IS NULL) OR (U.NAME = :B1 )) AND EXISTS (SELECT 1 FROM SYS.COL$ C WHERE C.OBJ# = TAB.BO# AND C.TYPE# IN ( 1, 2, 12, 23, 69, 96, 100, 101, 178, 179, 180, 181, 182, 183, 231) AND BITAND(C.PROPERTY, 16384+32768) = 0 AND NOT (BITAND(C.PROPERTY, 32+65536+131072) = 32+65536 AND C.DEFAULT$ IS NULL ) AND NOT EXISTS (SELECT NULL FROM SYS.HIST_HEAD$ HH WHERE HH.OBJ# = TAB.OBJ# AND HH.INTCOL# = C.INTCOL#) )) UNION ALL SELECT TAB.OBJ# OBJ#, :B3 FLGS FROM (SELECT T.OBJ# OBJ#, T.OBJ# BO#, T.ANALYZETIME, T.ROWCNT FROM SYS.TAB$ T WHERE BITAND(T.FLAGS,16) = 16 UNION ALL SELECT T.OBJ# OBJ#, T.BO# BO#, T.ANALYZETIME, T.ROWCNT FROM SYS.TABPART$ T WHERE BITAND(T.FLAGS,2) = 2 UNION ALL SELECT T.OBJ# OBJ#, T.BO# BO#, T.ANALYZETIME, T.ROWCNT FROM SYS.TABCOMPART$ T WHERE BITAND(T.FLAGS,2) = 2 UNION ALL SELECT T.OBJ# OBJ#, TCP.BO# BO#, T.ANALYZETIME, T.ROWCNT FROM SYS.TABSUBPART$ T, SYS.TABCOMPART$ TCP WHERE BITAND(T.FLAGS,2) = 2 AND T.POBJ# = TCP.OBJ# ) TAB, SYS.OBJ$ O, SYS.USER$ U, SYS.OPTSTAT_USER_PREFS$ P1 WHERE TAB.BO# = O.OBJ# AND O.OWNER# = U.USER# AND ((:B1 IS NULL) OR (U.NAME = :B1 )) AND TAB.BO# = P1.OBJ#(+) AND P1.PNAME(+) = 'METHOD_OPT' AND EXISTS (
SELECT NULL FROM SYS.HIST_HEAD$ HH, SYS.COL_USAGE$ CU WHERE CU.OBJ# = TAB.BO# AND HH.OBJ# = TAB.OBJ# AND HH.INTCOL# = CU.INTCOL# AND BITAND(HH.SPARE2, 16) = 0 AND NOT (NVL(HH.ROW_CNT, 0) > 0 OR HH.NULL_CNT = TAB.ROWCNT) AND (CASE WHEN TAB.ANALYZETIME < CU.TIMESTAMP THEN (EQUALITY_PREDS + EQUIJOIN_PREDS + RANGE_PREDS + LIKE_PREDS + NONEQUIJOIN_PREDS) ELSE 0 END) > 0) AND EXISTS (SELECT 1 FROM SYS.COL$ C WHERE C.OBJ# = TAB.BO# AND C.TYPE# IN ( 1, 2, 12, 23, 69, 96, 100, 101, 178, 179, 180, 181, 182, 183, 231) AND BITAND(C.PROPERTY, 16384+32768) = 0 AND NOT (BITAND(C.PROPERTY, 32+65536+131072) = 32+65536 AND C.DEFAULT$ IS NULL ) AND DBMS_STATS.COLUMN_NEED_HIST(U.NAME, O.NAME, C.NAME, NVL(P1.VALCHAR, :B4 )) = 1) UNION ALL SELECT DISTINCT CGU.OBJ# OBJ#, :B10 FLGS FROM SYS.COL_GROUP_USAGE$ CGU, OPTSTAT_USER_PREFS$ P1 WHERE BITAND(CGU.FLAGS, :B9 + :B8 ) = 0 AND BITAND(CGU.FLAGS, :B7 + :B6 ) > 0 AND CGU.OBJ# = P1.OBJ#(+) AND P1.PNAME(+)='AUTO_STAT_EXTENSIONS' AND NVL(P1.VALCHAR, :B5 ) = 'ON' ) GROUP BY OBJ# ) MCS ON (MM.OBJ# = MCS.OBJ#) WHEN MATCHED THEN 
UPDATE SET FLAGS = FLAGS + MCS.FLGS - BITAND(FLAGS, MCS.FLGS) WHEN NOT MATCHED THEN INSERT (OBJ#, INSERTS, UPDATES, DELETES, TIMESTAMP, FLAGS, DROP_SEGMENTS) VALUES(MCS.OBJ#, 0, 0, 0, :B11 , MCS.FLGS, 0)

We can see a subquery based on COL_USAGE$ (COL_USAGE$ tracks the predicate usage):

In 19c, MERGE doesn’t have a subquery with COL_USAGE$:

merge /* KSXM:OPTIM_DML_INF */  into sys.mon_mods_all$ m                   using dual                                                                 on (m.obj# = :objn and                                                         (:pobjn = 0 or                                                              m.obj# = :pobjn or                                                         m.obj# in (select obj#                                                                from tabpart$                                                              where bo# = :pobjn and obj# = :objn                                        union all                                                                  select obj#                                                                from tabcompart$                                                           where bo# = :pobjn and obj# = :objn                                        union all                                                                  select tsp.obj#                                                            from tabsubpart$ tsp, tabcompart$ tcp                                      where tsp.pobj# = tcp.obj#                                                   and (tcp.bo# = :pobjn or tcp.obj# = :pobjn)                                and tsp.obj# = :objn)))                                  when matched then                                                            update                                                                     set inserts = decode(:reset, 1, 0, inserts) + :ins,                            updates = decode(:reset, 1, 0, updates) + :upd,                            deletes = decode(:reset, 1, 0, deletes) + :del,                            flags =  decode(:reset, 1, 0, flags - bitand(flags, :flag)) + :flag,       drop_segments = decode(:reset, 1, 0, drop_segments) + :dropseg,            timestamp = :time                                                    when NOT matched then                                                        insert (obj#, inserts, updates, deletes,                                           timestamp, flags, drop_segments)                                   values (:objn, :ins, :upd, :del, :time, :flag, :dropseg)                   where :ins != 0 or :upd != 0 or :del != 0 or :flag != 0 or :dropseg !=0

Therefore, DBMS_STATS.GATHER_SCHEMA_STATS doesn’t set FLAGS based on predicate usage.

But if you think you’re safe in 19c, you’re wrong. Below is the scenario how the same problem can occur in 19c.

AUTOSTATS_TARGET=ORACLE

Some of our applications contain heavy ETL processing. Since many tables are volatile, statistics can’t be gathered at some random times. Therefore, the application schedules schema statistics collection, and we had to configure automatic optimizer statistics gathering only for dictionary tables:

BEGIN
  DBMS_STATS.SET_GLOBAL_PREFS('AUTOSTATS_TARGET','ORACLE');
END;
/

select dbms_stats.get_prefs('AUTOSTATS_TARGET') from dual ;

DBMS_STATS.GET_PREFS('AUTOSTATS_TARGET')
--------------------------------------------------------------------------------
ORACLE

Let’s see how this causes trouble in both 19.13 and 21.4.

The following commands create a table and then query it to populate COL_USAGE$.

drop table t ;
create table t ( n integer ) ;
insert into t select rownum from dual connect by level <= 100 ;

begin
  dbms_stats.gather_table_stats( 
    ownname => null, tabname => 'T' ) ;
end ;
/

alter session set nls_date_format='dd.mm.yyyy hh24:mi:ss' ;
insert into t values (1);
exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO ;

exec dbms_session.sleep(1) ;
select * from t where n = 1 ;
exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO ;

--column usage timestamp must be greater than analysze time
exec dbms_session.sleep(1) ;

FLAGS is zero and statistics are stale:

select inserts,flags from sys.mon_mods_all$ m, user_objects o
  where o.object_name = 'T' and o.object_id = m.obj# ;

   INSERTS      FLAGS
---------- ----------
         1          0

select last_analyzed, stale_stats from user_tab_statistics 
  where table_name = 'T' ;

LAST_ANALYZED       STALE_S
------------------- -------
12.11.2021 17:58:08 NO

Now I’m running the automatic optimizer statistics job:

--als sys
exec dbms_stats.gather_database_stats_job_proc () ;

This job expectedly didn’t gather statistics for our table, because it was configured to process only dictionary tables:

select last_analyzed, stale_stats from user_tab_statistics 
  where table_name = 'T' ;

LAST_ANALYZED       STALE_S
------------------- -------
12.11.2021 17:58:08 NO

But it set the third bit in FLAGS:

select inserts,flags from sys.mon_mods_all$ m, user_objects o
  where o.object_name = 'T' and o.object_id = m.obj# ;

   INSERTS      FLAGS
---------- ----------
         1          8

It put a little bump on the road, so to speak, that will lead the next statistics collection astray:

exec dbms_session.sleep(1) ;

begin
  dbms_stats.gather_schema_stats( 
    ownname => null, options => 'gather stale' ) ;
end ;
/

The statistics were gathered despite not being stale:

select last_analyzed, stale_stats from user_tab_statistics 
  where table_name = 'T' ;

LAST_ANALYZED       STALE_S
------------------- -------
12.11.2021 18:00:40 NO

In the SQL trace of the automatic optimizer statistics job we can find a similar MERGE as in 21c GATHER_SCHEMA_STATS:

MERGE /*+ dynamic_sampling(4) dynamic_sampling_est_cdn  */ INTO SYS.MON_MODS_ALL$ MM USING ( SELECT OBJ#, SUM(FLGS) FLGS FROM ( SELECT OBJ#, :B1 FLGS FROM ( SELECT DISTINCT TAB.OBJ# FROM (SELECT T.OBJ# OBJ#, T.OBJ# BO#, T.ANALYZETIME FROM SYS.TAB$ T WHERE BITAND(T.FLAGS,16) = 16 UNION ALL SELECT T.OBJ# OBJ#, T.BO# BO#, T.ANALYZETIME FROM SYS.TABPART$ T WHERE BITAND(T.FLAGS,2) = 2 UNION ALL SELECT T.OBJ# OBJ#, T.BO# BO#, T.ANALYZETIME FROM SYS.TABCOMPART$ T WHERE BITAND(T.FLAGS,2) = 2 UNION ALL SELECT T.OBJ# OBJ#, TCP.BO# BO#, T.ANALYZETIME FROM SYS.TABSUBPART$ T, SYS.TABCOMPART$ TCP WHERE BITAND(T.FLAGS,2) = 2 AND T.POBJ# = TCP.OBJ# ) TAB, SYS.OBJ$ O, SYS.COL$ C WHERE TAB.BO# = O.OBJ# AND TAB.ANALYZETIME < O.MTIME AND C.OBJ# = TAB.BO# AND C.TYPE# IN ( 1, 2, 12, 23, 69, 96, 100, 101, 178, 179, 180, 181, 182, 183, 231) AND BITAND(C.PROPERTY, 16384+32768) = 0 AND NOT (BITAND(C.PROPERTY, 32+65536+131072) = 32+65536 AND C.DEFAULT$ IS NULL ) AND NOT EXISTS (SELECT NULL FROM SYS.HIST_HEAD$ HH WHERE HH.OBJ# = TAB.OBJ# AND HH.INTCOL# = C.INTCOL#)) UNION ALL SELECT DISTINCT TAB.OBJ# OBJ#, :B3 FLGS FROM (SELECT T.OBJ# OBJ#, T.OBJ# BO#, T.ANALYZETIME, T.ROWCNT FROM SYS.TAB$ T WHERE BITAND(T.FLAGS,16) = 16 UNION ALL SELECT T.OBJ# OBJ#, T.BO# BO#, T.ANALYZETIME, T.ROWCNT FROM SYS.TABPART$ T WHERE BITAND(T.FLAGS,2) = 2 UNION ALL SELECT T.OBJ# OBJ#, T.BO# BO#, T.ANALYZETIME, T.ROWCNT FROM SYS.TABCOMPART$ T WHERE BITAND(T.FLAGS,2) = 2 UNION ALL SELECT T.OBJ# OBJ#, TCP.BO# BO#, T.ANALYZETIME, T.ROWCNT FROM SYS.TABSUBPART$ T, SYS.TABCOMPART$ TCP WHERE BITAND(T.FLAGS,2) = 2 AND T.POBJ# = TCP.OBJ# ) TAB, SYS.OBJ$ O, SYS.USER$ U, SYS.COL$ C, SYS.OPTSTAT_USER_PREFS$ P1 WHERE TAB.BO# = O.OBJ# AND C.OBJ# = TAB.BO# AND O.OWNER# = U.USER# AND TAB.OBJ# = P1.OBJ#(+) AND P1.PNAME(+) = 'METHOD_OPT' AND C.TYPE# IN ( 1, 2, 12, 23, 69, 96, 100, 101, 178, 179, 180, 181, 182, 183, 231) AND BITAND(C.PROPERTY, 16384+32768) = 0 AND NOT (BITAND(C.PROPERTY, 32+65536+131072) = 32+65536 AND C.DEFAULT$ IS NULL ) AND DBMS_STATS.COLUMN_NEED_HIST(U.NAME, O.NAME, C.NAME, NVL(P1.VALCHAR, :B2 )) = 1 AND EXISTS (
SELECT NULL FROM SYS.HIST_HEAD$ HH, SYS.COL_USAGE$ CU WHERE CU.OBJ# = TAB.BO# AND HH.OBJ# = TAB.OBJ# AND HH.INTCOL# = CU.INTCOL# AND BITAND(HH.SPARE2, 16) = 0 AND NOT (NVL(HH.ROW_CNT, 0) > 0 OR HH.NULL_CNT = TAB.ROWCNT) AND (CASE WHEN TAB.ANALYZETIME < CU.TIMESTAMP THEN (EQUALITY_PREDS + EQUIJOIN_PREDS + RANGE_PREDS + LIKE_PREDS + NONEQUIJOIN_PREDS) ELSE 0 END) > 0) UNION ALL SELECT DISTINCT CGU.OBJ# OBJ#, :B9 FLGS FROM SYS.COL_GROUP_USAGE$ CGU, OPTSTAT_USER_PREFS$ P1 WHERE BITAND(CGU.FLAGS, :B8 + :B7 ) = 0 AND BITAND(CGU.FLAGS, :B6 + :B5 ) > 0 AND CGU.OBJ# = P1.OBJ#(+) AND P1.PNAME(+)='AUTO_STAT_EXTENSIONS' AND NVL(P1.VALCHAR, :B4 ) = 'ON' ) GROUP BY OBJ# ) MCS ON (MM.OBJ# = MCS.OBJ#) WHEN MATCHED THEN 
UPDATE SET FLAGS = FLAGS + MCS.FLGS - BITAND(FLAGS, MCS.FLGS) WHEN NOT MATCHED THEN INSERT (OBJ#, INSERTS, UPDATES, DELETES, TIMESTAMP, FLAGS, DROP_SEGMENTS) VALUES(MCS.OBJ#, 0, 0, 0, :B10 , MCS.FLGS, 0)

Simply put, in 21c Oracle disseminated the spurious logic to other procedures.

A bug or not a bug? That is the question.

Traditionally, Oracle marks statistics “stale” when “enough” data in a table (or partition) changed. As we saw, statistics are also considered stale after somebody just ran a query and the database decided to create a histogram despite data changed only insignificantly. This can cause unpredictable long running statistics gathering and a significant load in the database.

In my opinion, only the amount of the changes in a table should be considered when gathering stale statistics. On the other hand, Oracle developers might argue that the statistics could be considered incomplete if the application starts using a new predicate with a literal value on a column without a histogram. If that’s the case, I’d prefer to have a parameter for controlling this behavior and distinguishing between both cases.

Automatic optimizer statistics collection definitely shouldn’t mess with user tables when configured only for dictionary tables.

Workarounds

I’m not aware of any parametrization that would influence this behaviour.

In Oracle 19c you can switch off the auto statistics job and run GATHER_SCHEMA_STATISTICS for different schemas and dictionary statistics.

In Oracle 21c, the only thing I can think of is writing a procedure that gathers table statistics for the tables with stale statistics.

Summary

Marking statistics stale isn’t only triggered by the amount of changes in a table. The current implementation considers the statistics stale also if somebody ran a query with literals on a column without histogram. This can cause lots of unnecessary heavy, long running statistics gathering. It isn’t clear whether this can be classified as a bug. The situation got worse in 21c because unexpected statistics gathering is triggered from more places than in 19c. You can overcome this by writing your own procedures for gathering stale statistics.

Update on November 15, 2021

Many thanks to Nigel Bayliss, the product manager for the Oracle Optimizer, for his insights below.

Column usage

If column usage changes and you are using METHOD_OPT with ‘SIZE AUTO’, then the database might choose to re-gather statistics and create a new histogram.

In contrast, METHOD_OPT with ‘SIZE 1’ or ‘SIZE REPEAT’ shouldn’t trigger re-gathering statistics solely based on column usage, but they do.

Therefore, Nigel filed the following new bug :

Bug 33569852 – HISTOGRAM STALE CHECK IGNORES DBMS_STATS PARAMETER

Apparently, there’s a logic that checks for ‘SIZE 1’ or ‘SIZE ‘REPEAT’ when deciding whether to re-gather statistics based on changed column usage, but it checks only the table preference value and ignores the parameter specified in the gather_*_stats call. So you can use the following workaround for preventing statistics re-gathering based only on new column usage:

exec dbms_stats.set_table_prefs(user,'t','method_opt','for all columns size 1')

(Storing preferences is better than specifying them as a parameter to API anyway.)
Then, you can specify METHOD_OPT with ‘REPEAT’ at the DBMS_STATS.GATHER_SCHEMA_STATS call, for example, to gather existing histograms, once the statistics become STALE becuase of data changes.

AUTOSTATS_TARGET = ORACLE

Automatic optimizer statistics collection job must not manipulate flags of non-dictionary tables. Nigel filed the following new bug:

Bug 33569886 – AUTO STATS AUTOSTATS_TARGET=ORACLE SETS HISTOGRAM STALE FLAG IN NON-ORACLE SCHEMAS

Thanks for sharing

Unexpected Optimizer Statistics Collection