In my previous blog post I described an optimizer bug where the cost of an operation in the execution plan was massively underestimated. This is how the execution plan looked like:
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 48 (100)| |
| 1 | NESTED LOOPS OUTER | | 1 | 29 | 48 (5)| 00:00:01 |
|* 2 | HASH JOIN | | 1 | 16 | 48 (5)| 00:00:01 |
| 3 | VIEW | | 1 | 13 | 2 (0)| 00:00:01 |
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
| 5 | TABLE ACCESS FULL | T3 | 1 | 26 | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | T1 | 1000K| 2929K| 45 (3)| 00:00:01 |
| 7 | VIEW PUSHED PREDICATE | | 1 | 13 | 0 (0)| |
| 8 | NESTED LOOPS | | 1 | 16 | 1613 (1)| 00:00:01 |
|* 9 | INDEX UNIQUE SCAN | PK_T2 | 1 | 13 | 0 (0)| |
|* 10 | TABLE ACCESS FULL | T4 | 1 | 3 | 1613 (1)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."N1"="V2"."N1")
4 - access("N2"=PRIOR NULL)
9 - access("T2"."N1"="T1"."N1")
10 - filter(("T4"."N1"="T2"."N1" AND "T4"."N1"="T1"."N1"))
Apart from a ridiculous elapsed time, the problem revealed itself in the execution plan – the operation 7 had a lower cost than its child operation 8. That shouldn’t have happened, because the parent’s cost should also entail the costs of their children.
But why not scan all execution plans in the AWR repository for such anomalies?
The following query does exactly that – it simply identifies the operations with a lower cost than their children:
set linesize 200
column P_OP format A30
column C_OP format A30
select
c.sql_id,c.plan_hash_value, round(max(s.elapsed_time_total)/1e6) seconds,
p.id p_id, c.id c_id, p.cost p_cost, c.cost c_cost, c.cost - p.cost diff,
p.operation || ' ' || p.options p_op,
c.operation || ' ' || c.options c_op
from dba_hist_sql_plan c, dba_hist_sql_plan p, dba_hist_sqlstat s
where
c.dbid = ( select dbid from v$database ) and
c.dbid = p.dbid and
c.sql_id = p.sql_id and c.plan_hash_value = p.plan_hash_value and
c.parent_id = p.id and p.cost < c.cost
and c.sql_id = s.sql_id(+) and c.plan_hash_value = s.plan_hash_value(+)
and c.dbid = s.dbid
group by c.sql_id,c.plan_hash_value, c.id, c.depth, c.cost, c.parent_id,
c.operation, c.options, p.id, p.cost, p.operation, p.options
order by
-- max(s.elapsed_time_total) nulls first,
c.cost - p.cost,
c.depth
;
Find below a sample output:
...
SQL_ID PLAN_HASH_VALUE SECONDS P_ID C_ID P_COST C_COST DIFF P_OP C_OP
------------- --------------- ---------- ---------- ---------- ---------- ---------- ---------- ------------------------------ ------------------------------
0dcpv2ccqsmhd 853671621 686 7 8 6019 22164 16145 VIEW SORT ORDER BY
0dcpv2ccqsmhd 1815802073 176 7 8 6021 22172 16151 VIEW SORT ORDER BY
0dcpv2ccqsmhd 1523297245 105 7 8 6241 22983 16742 VIEW SORT ORDER BY
0dcpv2ccqsmhd 2908874411 2 7 8 6751 25024 18273 VIEW SORT ORDER BY
390nnp8kb6y8d 150295021 55354 56 57 3 21306 21303 VIEW PUSHED PREDICATE NESTED LOOPS
1dpscuz0dpx17 1825909960 2 7 8 8252 30792 22540 VIEW SORT ORDER BY
1dpscuz0dpx17 1716121194 0 7 8 8305 30988 22683 VIEW SORT ORDER BY
1z4k7x6bybabx 3618098506 756 55 56 2 22756 22754 VIEW PUSHED PREDICATE HASH JOIN OUTER
fbrajakyvkb2p 3618098506 562 55 56 2 22756 22754 VIEW PUSHED PREDICATE HASH JOIN OUTER
0dcpv2ccqsmhd 2979690613 2 7 8 18390 67728 49338 VIEW SORT ORDER BY
1dpscuz0dpx17 3226924111 6 6 7 18490 69448 50958 VIEW SORT ORDER BY
1dpscuz0dpx17 3740554088 22 6 7 18508 69516 51008 VIEW SORT ORDER BY
2g8gx4ja5fm7s 2606037132 41 7 8 448305 522853 74548 SORT JOIN VIEW
20uhd3mr2kfvd 2591703555 222 21 27 7522 679670 672148 HASH JOIN VIEW
9c1hczh7hsxf3 1171142012 226 20 26 7522 679688 672166 HASH JOIN VIEW
44y663h05c264 338860802 153 2 3 513 1829850 1829337 NESTED LOOPS VIEW
44y663h05c264 1209751238 50 2 3 513 1835472 1834959 NESTED LOOPS VIEW
44y663h05c264 1765264009 89 2 3 513 1836176 1835663 NESTED LOOPS VIEW
adhz0uzcu79tx 1472624336 997 91 92 3 96798591 96798588 VIEW PUSHED PREDICATE MERGE JOIN OUTER
Columns for cost propagation errors query
Column | Description |
---|---|
SQL_ID | SQL ID |
PLAN_HASH_VALUE | execution plan hash value |
SECONDS | total elapsed seconds |
P_ID | parent step ID in the execution plan |
C_ID | child step ID in the execution plan |
P_COST | parent operation cost |
C_COST | child operation cost |
DIFF | difference between child and parent cost |
P_OP | parent operation name |
C_OP | child operation name |
The SECONDS column contains the total elapsed time. That’s certainly one dimension worth focusing on in order to identify candidates with the highest tuning potential. The other is the magnitude of the non-propagated cost (DIFF), which might indicate the operations where the optimizer made the largest mistakes.
Last but not least, ensure that you licensed the Diagnostic and Tuning Pack before running the query.
As an alternative, here’s the poor man’s way for obtaining similar information without any additional licenses:
select
c.sql_id,c.plan_hash_value, round(max(s.elapsed_time)/1e6) seconds,
p.id p_id, c.id c_id, p.cost p_cost, c.cost c_cost, c.cost - p.cost diff,
p.operation || ' ' || p.options p_op,
c.operation || ' ' || c.options c_op
from v$sql_plan c, v$sql_plan p, v$sqlarea s
where
c.sql_id = p.sql_id and c.plan_hash_value = p.plan_hash_value and
c.child_number = p.child_number and
c.parent_id = p.id and p.cost < c.cost
and c.sql_id = s.sql_id(+) and c.plan_hash_value = s.plan_hash_value(+)
group by c.sql_id,c.plan_hash_value, c.id, c.depth, c.cost, c.parent_id,
c.operation, c.options, p.id, p.cost, p.operation, p.options
order by
-- max(s.elapsed_time) nulls first ,
c.cost - p.cost,
c.depth
;
The query above gets the information from v$sql_plan and v$sqlarea instead of dba_hist_sql_plan and dba_hist_sqlstat, respectively. Since these are just the snapshots on the shared pool, you’d need to periodically run it to obtain the information over a larger time period.