WINDOW SORT PUSHED RANK Performance Regression in Oracle 21c

Merge with a WINDOW SORT PUSHED RANK in the execution plan can perform drastically worse in Oracle 21c than in 19c (tested on 21.6 and 19.15, respectively).

Here’s the test case:

create table t1 (n1 number, c1 varchar2(10)) ;
create table t2 (n1 number, c1 varchar2(10), n2 number) ;

insert into t1 select level, null from dual connect by level <= 5*1e5 ;
insert into t2 select 5*1e5 + level, 'X', 1 from dual connect by level <= 4*1e5 ;
commit ;

exec dbms_stats.gather_table_stats(null, 'T1');
exec dbms_stats.gather_table_stats(null, 'T2');

set timing on 

merge /*+ gather_plan_statistics monitor */ --nenad
into t1 using
( select n1,c1 from 
  ( select n1,c1, 
      row_number() over (partition by n1 order by n2 desc) rn
	from t2
  ) where rn = 1
) v1
on ( v1.n1 = t1.n1 )
when matched then 
update set t1.c1 = v1.c1 ;

On 19c, merge took less than a second to complete:

Global Stats
=========================================
| Elapsed |   Cpu   |  Other   | Buffer |
| Time(s) | Time(s) | Waits(s) |  Gets  |
=========================================
|    0.41 |    0.40 |     0.01 |   1683 |
=========================================

SQL Plan Monitoring Details (Plan Hash Value=3917951292)
===========================================================================================================================================
| Id |           Operation           | Name |  Rows   | Cost |   Time    | Start  | Execs |   Rows   |  Mem  | Activity | Activity Detail |
|    |                               |      | (Estim) |      | Active(s) | Active |       | (Actual) | (Max) |   (%)    |   (# samples)   |
===========================================================================================================================================
|  0 | MERGE STATEMENT               |      |         |      |         1 |     +1 |     1 |        1 |     . |          |                 |
|  1 |   MERGE                       | T1   |         |      |         1 |     +1 |     1 |        1 |     . |          |                 |
|  2 |    VIEW                       |      |         |      |           |        |     1 |          |     . |          |                 |
|  3 |     HASH JOIN                 |      |       1 | 3161 |         1 |     +1 |     1 |        0 |  32MB |          |                 |
|  4 |      TABLE ACCESS FULL        | T1   |    500K |  155 |         1 |     +1 |     1 |     500K |     . |          |                 |
|  5 |      VIEW                     |      |    400K | 1750 |         1 |     +1 |     1 |     400K |     . |          |                 |
|  6 |       WINDOW SORT PUSHED RANK |      |    400K | 1750 |         1 |     +1 |     1 |     400K |  23MB |          |                 |
|  7 |        TABLE ACCESS FULL      | T2   |    400K |  155 |         1 |     +1 |     1 |     400K |     . |          |                 |
===========================================================================================================================================

On 21c, the identical execution plan took 185s, mostly on CPU:

Global Stats
=========================================
| Elapsed |   Cpu   |  Other   | Buffer |
| Time(s) | Time(s) | Waits(s) |  Gets  |
=========================================
|     185 |     180 |     4.55 |   1639 |
=========================================

SQL Plan Monitoring Details (Plan Hash Value=3917951292)
===========================================================================================================================================
| Id |           Operation           | Name |  Rows   | Cost |   Time    | Start  | Execs |   Rows   |  Mem  | Activity | Activity Detail |
|    |                               |      | (Estim) |      | Active(s) | Active |       | (Actual) | (Max) |   (%)    |   (# samples)   |
===========================================================================================================================================
|  0 | MERGE STATEMENT               |      |         |      |         1 |   +184 |     1 |        1 |     . |          |                 |
|  1 |   MERGE                       | T1   |         |      |         1 |   +184 |     1 |        1 |     . |          |                 |
|  2 |    VIEW                       |      |         |      |           |        |     1 |          |     . |          |                 |
|  3 |     HASH JOIN                 |      |       1 | 3161 |       183 |     +2 |     1 |        0 |  32MB |          |                 |
|  4 |      TABLE ACCESS FULL        | T1   |    500K |  155 |         1 |     +2 |     1 |     500K |     . |          |                 |
|  5 |      VIEW                     |      |    400K | 1750 |       183 |     +2 |     1 |     400K |     . |          |                 |
|  6 |       WINDOW SORT PUSHED RANK |      |    400K | 1750 |       184 |     +1 |     1 |     400K |  23MB |          |                 |
|  7 |        TABLE ACCESS FULL      | T2   |    400K |  155 |         1 |     +2 |     1 |     400K |     . |          |                 |
===========================================================================================================================================

In spite of the identical execution plan and the same number of logical reads, the execution was much longer in 21c. All the extra time was spent in the WINDOW SORT PUSHED RANK step. In particular, the profiled execution shows that the most of the time was spent in the Oracle C function smbgetqbPart:

export BPFTRACE_CACHE_USER_SYMBOLS=1
bpftrace -e 'profile:hz:99 
/ pid == 2990815 / 
{ 
  @[ustack()] = count(); 
}'

@[
smbgetqbPart+477
sorgetqbf+599
qerwnFetch+718
qerstFetch+534
qervwFetch+163
qerstFetch+534
rwsfcd+151
qerstFetch+534
qerhnFetch+6565
qerstFetch+534
qervwFetch+163
qerstFetch+534
rwsfcd+151
qerstFetch+534
qerusFetch+685
qerstFetch+534
updaul+3457
updThreePhaseExe+325
upsexe+1024
opiexe+12237
kpoal8+2239
opiodr+1256
ttcpip+1257
opitsk+1906
opiino+1008
opiodr+1256
opidrv+1067
sou2o+165
opimai_real+400
ssthrdmain+412
main+292
__libc_start_main+243
0x5541f689495641d7
]: 1886

Based on the naming, I’m guessing that the function smbgetqbPart gets the query block window partition. The problem is a very high number of executions (800001), which, intuitively, seems unnecessary:

bpftrace -e 'uprobe:/u00/oracle/orabase/product/21.6.0.0.220419_b/bin/oracle:smbgetqbPart+2
/ pid == 2990815 / 
{ 
  @ = count(); 
}'

@: 800001

In contrast, in 19c the function hasn’t been executed at all:

bpftrace -e 'uprobe:/u00/oracle/orabase/product/19.15.0.0.220419_a/bin/oracle:smbgetqbPart+2
/ pid == 3139940 / 
{ 
  @ = count(); 
}'

@: 0

Simply put, the window partitioning execution changed between two releases. In particular, the following optimization introduced the problem:

select bugno, value, sql_feature, description, optimizer_feature_enable 
  from v$system_fix_control 
  where lower(description) like '%pushed rank%' ;

     BUGNO      VALUE
---------- ----------
SQL_FEATURE
----------------------------------------------------------------
DESCRIPTION
----------------------------------------------------------------
OPTIMIZER_FEATURE_ENABLE
-------------------------
  30822446          1
QKSFM_EXECUTION_30822446
enable window pushed rank optimization with partition by keys
21.1.0

After switching off the bug fix:

alter session set "_FIX_CONTROL"='30822446:OFF'  ;

the execution time reduced to the normal levels:

Elapsed: 00:00:00.40

Also, the function smbgetqbPart isn’t being executed anymore:

bpftrace -e 'uprobe:/u00/oracle/orabase/product/21.6.0.0.220419_b/bin/oracle:smbgetqbPart+2
/ pid == 3154842 / 
{ 
  @ = count(); 
}'

@: 0

In conclusion, the new window pushed rank optimization can cause a severe performance degradation in Oracle 21c. Fortunately, the change can be switched off with the fix control (not documented in the Oracle support database).

Update on January 18, 2023

Oracle released the patch “34381258 MERGE PERFORMANCE ISSUE AFTER BUG FIX 30822446” to fix the problem. Thanks to Martin Decker for mentioning this in the comment. The patch is currently available for 19c but not for 21c. I tested it on top of 19.17.

Thanks for sharing

Nenad Noveljic

10 Comments

  1. Nenad
    Thanks for sharing this information.
    This will allow me to avoid spending hours and hours looking for the reason of this problem when we upgrade to 21c
    Best regards

    • Hello Mohamed, I’ve learned a lot from your articles too!

  2. Hi Nenad,

    very nice analysis. I have identified this issue after patching to 19.15.0 and enabling all dbms_optim_bundle fix controls. With repetitive trial and error disabling one fix control after, I could identified the this fix control for this issue a couple of weeks ago. Thought about logging an SR, but it takes so much effort to get suitable engineer that understands the problem. Now globally disabled this fix_control but left other dbms_optim_bundle fix controls enabled.

    Regards,
    Martin

    • Hi Martin,
      Thank you for this information. The bug fix seem to appeared first in 19.12. It’s disabled by default in 19c.
      Best regards,
      Nenad

    • Thanks. Do you have the text of this bug? It isn’t visible on MOS.

  3. Hi Nenad,

    sorry, I don´t. But since September, there is an interim patch available:

    34381258 MERGE PERFORMANCE ISSUE AFTER BUG FIX 30822446

    Regards,
    Martin

  4. Just to let you know we ran into this bug and your blog helped us out considerably.
    Interestingly we were using the fix_control for 30822446 set ON to allow us to install the 19.17 RU because that RU introduced a bug requiring it.
    We went down the path of installing the interim patch 34381258 on top of 19.17 which seems to fix both issues for us

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.