In the previous installment, I described a case where Oracle optimizer doesn’t unnest the subquery in the SELECT clause, even when that would be a better option. Christian Antognini pointed out that this happens because the scalar subquery didn’t return a single value, which disqualified the subquery from the subquery unnesting transformation. Nevertheless, SQL Server managed to rewrite the same query to use a more efficient semi join instead, which resulted in orders of magnitude less logical reads.
That prompted me to test more such queries and explore the execution plans in both database products. This information can be used for manual optimization, as it’s often possible to rewrite a subquery in the SELECT clause as either a semi join or an outer join.
I tested on Oracle 18c and SQL Server 2017.
The non-indexed test tables t_1k and t_100k contain distinct integer values and are defined in my previous blog post.
Query 1 – EXISTS
select
case when exists (
select 1
from t_1k
where t_1k.n1 = t_100k.n1
) then 1 else 0 end
from t_100k ;
SQL Server:
StmtText StmtId NodeId Parent PhysicalOp LogicalOp Argument DefinedValues EstimateRows EstimateIO EstimateCPU AvgRowSize TotalSubtreeCost OutputList Warnings Type Parallel EstimateExecutions
select case when exists ( select 1 from t_1k where t_1k.n1 = t_100k.n1 ) then 1 else 0 end from t_100k ; 1 1 0 NULL NULL 1 NULL 100000 NULL NULL NULL 7.925597 NULL NULL SELECT 0 NULL
|--Compute Scalar(DEFINE:([Expr1007]=CASE WHEN [Expr1008] THEN (1) ELSE (0) END)) 1 2 1 Compute Scalar Compute Scalar DEFINE:([Expr1007]=CASE WHEN [Expr1008] THEN (1) ELSE (0) END) [Expr1007]=CASE WHEN [Expr1008] THEN (1) ELSE (0) END 100000 0 0.01 11 7.925597 [Expr1007] NULL PLAN_ROW 0 1
|--Merge Join(Left Semi Join, MANY-TO-MANY MERGE:([BVTOOLS].[dbo].[t_100k].[n1])=([BVTOOLS].[dbo].[t_1k].[n1]), RESIDUAL:([BVTOOLS].[dbo].[t_1k].[n1]=[BVTOOLS].[dbo].[t_100k].[n1])) 1 3 2 Merge Join Left Semi Join MANY-TO-MANY MERGE:([BVTOOLS].[dbo].[t_100k].[n1])=([BVTOOLS].[dbo].[t_1k].[n1]), RESIDUAL:([BVTOOLS].[dbo].[t_1k].[n1]=[BVTOOLS].[dbo].[t_100k].[n1]) NULL 100000 0.000313 0.0336502 9 7.915597 [Expr1008] NULL PLAN_ROW 0 1
|--Sort(ORDER BY:([BVTOOLS].[dbo].[t_100k].[n1] ASC)) 1 4 3 Sort Sort ORDER BY:([BVTOOLS].[dbo].[t_100k].[n1] ASC) NULL 100000 0.01126126 7.623925 11 7.866987 [BVTOOLS].[dbo].[t_100k].[n1] NULL PLAN_ROW 0 1
| |--Table Scan(OBJECT:([BVTOOLS].[dbo].[t_100k])) 1 5 4 Table Scan Table Scan OBJECT:([BVTOOLS].[dbo].[t_100k]) [BVTOOLS].[dbo].[t_100k].[n1] 100000 0.1216435 0.110157 11 0.2318005 [BVTOOLS].[dbo].[t_100k].[n1] NULL PLAN_ROW 0 1
|--Sort(ORDER BY:([BVTOOLS].[dbo].[t_1k].[n1] ASC)) 1 6 3 Sort Sort ORDER BY:([BVTOOLS].[dbo].[t_1k].[n1] ASC) NULL 1 0.01126126 0.000100011 11 0.01464437 [BVTOOLS].[dbo].[t_1k].[n1] NULL PLAN_ROW 0 1
|--Table Scan(OBJECT:([BVTOOLS].[dbo].[t_1k])) 1 7 6 Table Scan Table Scan OBJECT:([BVTOOLS].[dbo].[t_1k]) [BVTOOLS].[dbo].[t_1k].[n1] 1 0.003125 0.0001581 11 0.0032831 [BVTOOLS].[dbo].[t_1k].[n1] NULL PLAN_ROW 0 1
(100000 rows affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_1k'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_100k'. Scan count 1, logical reads 161, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Oracle:
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 488K| 356K (1)| 00:00:56 |
|* 1 | TABLE ACCESS FULL| T_1K | 1 | 4 | 4 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL| T_100K | 100K| 488K| 33 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T_1K"."N1"=:B1)
consistent gets 499163
Conclusion: Unlike Oracle, SQL Server rewrote the query to merge semi join and produced a more efficient execution plan.
Query 2 – MAX
The query below is a part of the demo script subquery_unnesting.sql from Christian Antognini’s book Troubleshooting Oracle Performance, 2nd Edition:
select
t_100k.*,
(select max(t_1k.n1) from t_1k where t_1k.n1 = t_100k.n1) as max_id
from t_100k;
SQL Server:
StmtText StmtId NodeId Parent PhysicalOp LogicalOp Argument DefinedValues EstimateRows EstimateIO EstimateCPU AvgRowSize TotalSubtreeCost OutputList Warnings Type Parallel EstimateExecutions
select t_100k.*, (select max(t_1k.n1) from t_1k where t_1k.n1 = t_100k.n1) as max_id from t_100k; 1 1 0 NULL NULL 1 NULL 100000 NULL NULL NULL 1.072298 NULL NULL SELECT 0 NULL
|--Compute Scalar(DEFINE:([Expr1008]=[Expr1006])) 1 2 1 Compute Scalar Compute Scalar DEFINE:([Expr1008]=[Expr1006]) [Expr1008]=[Expr1006] 100000 0 0.01 15 1.072298 [BVTOOLS].[dbo].[t_100k].[n1], [Expr1008] NULL PLAN_ROW 0 1
|--Hash Match(Right Outer Join, HASH:([BVTOOLS].[dbo].[t_1k].[n1])=([BVTOOLS].[dbo].[t_100k].[n1]), RESIDUAL:([BVTOOLS].[dbo].[t_1k].[n1]=[BVTOOLS].[dbo].[t_100k].[n1])) 1 3 2 Hash Match Right Outer Join HASH:([BVTOOLS].[dbo].[t_1k].[n1])=([BVTOOLS].[dbo].[t_100k].[n1]), RESIDUAL:([BVTOOLS].[dbo].[t_1k].[n1]=[BVTOOLS].[dbo].[t_100k].[n1]) NULL 100000 0 0.7973643 15 1.062298 [BVTOOLS].[dbo].[t_100k].[n1], [Expr1006] NULL PLAN_ROW 0 1
|--Stream Aggregate(GROUP BY:([BVTOOLS].[dbo].[t_1k].[n1]) DEFINE:([Expr1006]=MAX([BVTOOLS].[dbo].[t_1k].[n1]))) 1 4 3 Stream Aggregate Aggregate GROUP BY:([BVTOOLS].[dbo].[t_1k].[n1]) [Expr1006]=MAX([BVTOOLS].[dbo].[t_1k].[n1]) 1000 0 0.0011 15 0.03313064 [BVTOOLS].[dbo].[t_1k].[n1], [Expr1006] NULL PLAN_ROW 0 1
| |--Sort(ORDER BY:([BVTOOLS].[dbo].[t_1k].[n1] ASC)) 1 5 4 Sort Sort ORDER BY:([BVTOOLS].[dbo].[t_1k].[n1] ASC) NULL 1000 0.01126126 0.01564663 11 0.03203064 [BVTOOLS].[dbo].[t_1k].[n1] NULL PLAN_ROW 0 1
| |--Table Scan(OBJECT:([BVTOOLS].[dbo].[t_1k])) 1 6 5 Table Scan Table Scan OBJECT:([BVTOOLS].[dbo].[t_1k]) [BVTOOLS].[dbo].[t_1k].[n1] 1000 0.003865741 0.001257 11 0.005122741 [BVTOOLS].[dbo].[t_1k].[n1] NULL PLAN_ROW 0 1
|--Table Scan(OBJECT:([BVTOOLS].[dbo].[t_100k])) 1 7 3 Table Scan Table Scan OBJECT:([BVTOOLS].[dbo].[t_100k]) [BVTOOLS].[dbo].[t_100k].[n1] 100000 0.1216435 0.110157 11 0.2318005 [BVTOOLS].[dbo].[t_100k].[n1] NULL PLAN_ROW 0 1
(100000 rows affected)
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_100k'. Scan count 1, logical reads 161, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_1k'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Oracle:
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 3027K| 39 (6)| 00:00:01 |
|* 1 | HASH JOIN RIGHT OUTER| | 100K| 3027K| 39 (6)| 00:00:01 |
| 2 | VIEW | VW_SSQ_1 | 1000 | 26000 | 5 (20)| 00:00:01 |
| 3 | HASH GROUP BY | | 1000 | 4000 | 5 (20)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T_1K | 1000 | 4000 | 4 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | T_100K | 100K| 488K| 33 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ITEM_1"(+)="T_100K"."N1")
consistent gets 168
Conclusion: Both database engines produced a good execution plan. Because of the MAX function, the optimizers recognized that the subquery returns a single value and then unnested the query.
Query 3 – DISTINCT
In the next query, I replaced MAX with DISTINCT which should also guarantee the single value.
select
t_100k.*,
(select distinct t_1k.n1 from t_1k where t_1k.n1 = t_100k.n1) as s_n1
from t_100k;
SQL Server:
StmtText StmtId NodeId Parent PhysicalOp LogicalOp Argument DefinedValues EstimateRows EstimateIO EstimateCPU AvgRowSize TotalSubtreeCost OutputList Warnings Type Parallel EstimateExecutions
select t_100k.*, (select distinct t_1k.n1 from t_1k where t_1k.n1 = t_100k.n1) as s_n1 from t_100k; 1 1 0 NULL NULL 1 NULL 100000 NULL NULL NULL 1.071298 NULL NULL SELECT 0 NULL
|--Compute Scalar(DEFINE:([Expr1007]=[BVTOOLS].[dbo].[t_1k].[n1])) 1 2 1 Compute Scalar Compute Scalar DEFINE:([Expr1007]=[BVTOOLS].[dbo].[t_1k].[n1]) [Expr1007]=[BVTOOLS].[dbo].[t_1k].[n1] 100000 0 0.01 15 1.071298 [BVTOOLS].[dbo].[t_100k].[n1], [Expr1007] NULL PLAN_ROW 0 1
|--Hash Match(Right Outer Join, HASH:([BVTOOLS].[dbo].[t_1k].[n1])=([BVTOOLS].[dbo].[t_100k].[n1]), RESIDUAL:([BVTOOLS].[dbo].[t_1k].[n1]=[BVTOOLS].[dbo].[t_100k].[n1])) 1 3 2 Hash Match Right Outer Join HASH:([BVTOOLS].[dbo].[t_1k].[n1])=([BVTOOLS].[dbo].[t_100k].[n1]), RESIDUAL:([BVTOOLS].[dbo].[t_1k].[n1]=[BVTOOLS].[dbo].[t_100k].[n1]) NULL 100000 0 0.7973643 15 1.061298 [BVTOOLS].[dbo].[t_100k].[n1], [BVTOOLS].[dbo].[t_1k].[n1] NULL PLAN_ROW 0 1
|--Sort(DISTINCT ORDER BY:([BVTOOLS].[dbo].[t_1k].[n1] ASC)) 1 5 3 Sort Distinct Sort DISTINCT ORDER BY:([BVTOOLS].[dbo].[t_1k].[n1] ASC) NULL 1000 0.01126126 0.01564664 15 0.03213064 [BVTOOLS].[dbo].[t_1k].[n1] NULL PLAN_ROW 0 1
| |--Compute Scalar(DEFINE:([Expr1006]=[BVTOOLS].[dbo].[t_1k].[n1])) 1 6 5 Compute Scalar Compute Scalar DEFINE:([Expr1006]=[BVTOOLS].[dbo].[t_1k].[n1]) [Expr1006]=[BVTOOLS].[dbo].[t_1k].[n1] 1000 0 0.0001 15 0.005222741 [BVTOOLS].[dbo].[t_1k].[n1], [Expr1006] NULL PLAN_ROW 0 1
| |--Table Scan(OBJECT:([BVTOOLS].[dbo].[t_1k])) 1 7 6 Table Scan Table Scan OBJECT:([BVTOOLS].[dbo].[t_1k]) [BVTOOLS].[dbo].[t_1k].[n1] 1000 0.003865741 0.001257 11 0.005122741 [BVTOOLS].[dbo].[t_1k].[n1] NULL PLAN_ROW 0 1
|--Table Scan(OBJECT:([BVTOOLS].[dbo].[t_100k])) 1 8 3 Table Scan Table Scan OBJECT:([BVTOOLS].[dbo].[t_100k]) [BVTOOLS].[dbo].[t_100k].[n1] 100000 0.1216435 0.110157 11 0.2318005 [BVTOOLS].[dbo].[t_100k].[n1] NULL PLAN_ROW 0 1
(100000 rows affected)
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_100k'. Scan count 1, logical reads 161, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_1k'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Oracle:
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 488K| 464K (21)| 00:01:13 |
| 1 | SORT UNIQUE NOSORT| | 1 | 4 | 5 (20)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| T_1K | 1 | 4 | 4 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | T_100K | 100K| 488K| 33 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("T_1K"."N1"=:B1)
consistent gets 500163
Conclusion: Although DISTINCT guarantees the uniqueness and, thus, the scalar subquery returns a single result per driving row, Oracle optimizer didn’t unnest the subquery, which resulted in a suboptimal execution plan.
Query 4 – primary key
In this example I’ll be using a large table with a primary key:
SQL Server:
create table t_1m_pk (
n1 integer, c1 nvarchar(100),
constraint pk_t_1m primary key nonclustered (n1)
) ;
insert into t_1m_pk
SELECT TOP (1000000)
n1 = CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id])),
'A'
FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
OPTION (MAXDOP 1);
Oracle:
create table t_1m_pk (
n1 integer, c1 varchar2(1),
constraint pk_1m_pk primary key (n1)
) ;
insert into t_1m_pk
select level , 'A'
from dual
connect by level <= 1000000;
commit ;
begin
dbms_stats.gather_table_stats ( null, 'T_1M_PK') ;
end ;
/
Query:
select
t_100k.*,
(select t_1m_pk.c1 from t_1m_pk where t_1m_pk.n1 = t_100k.n1) as s_n1
from t_100k;
SQL Server:
StmtText StmtId NodeId Parent PhysicalOp LogicalOp Argument DefinedValues EstimateRows EstimateIO EstimateCPU AvgRowSize TotalSubtreeCost OutputList Warnings Type Parallel EstimateExecutions
select t_100k.*, (select t_1m_pk.c1 from t_1m_pk where t_1m_pk.n1 = t_100k.n1) as s_n1 from t_100k; 1 1 0 NULL NULL 1 NULL 100000 NULL NULL NULL 11.14075 NULL NULL SELECT 0 NULL
|--Compute Scalar(DEFINE:([Expr1007]=[BVTOOLS].[dbo].[t_1m_pk].[c1])) 1 2 1 Compute Scalar Compute Scalar DEFINE:([Expr1007]=[BVTOOLS].[dbo].[t_1m_pk].[c1]) [Expr1007]=[BVTOOLS].[dbo].[t_1m_pk].[c1] 100000 0 0.01 16 11.14075 [BVTOOLS].[dbo].[t_100k].[n1], [Expr1007] NULL PLAN_ROW 0 1
|--Hash Match(Left Outer Join, HASH:([BVTOOLS].[dbo].[t_100k].[n1])=([BVTOOLS].[dbo].[t_1m_pk].[n1]), RESIDUAL:([BVTOOLS].[dbo].[t_1m_pk].[n1]=[BVTOOLS].[dbo].[t_100k].[n1])) 1 3 2 Hash Match Left Outer Join HASH:([BVTOOLS].[dbo].[t_100k].[n1])=([BVTOOLS].[dbo].[t_1m_pk].[n1]), RESIDUAL:([BVTOOLS].[dbo].[t_1m_pk].[n1]=[BVTOOLS].[dbo].[t_100k].[n1]) NULL 100000 0 8.057143 16 11.13075 [BVTOOLS].[dbo].[t_100k].[n1], [BVTOOLS].[dbo].[t_1m_pk].[c1] NULL PLAN_ROW 0 1
|--Table Scan(OBJECT:([BVTOOLS].[dbo].[t_100k])) 1 4 3 Table Scan Table Scan OBJECT:([BVTOOLS].[dbo].[t_100k]) [BVTOOLS].[dbo].[t_100k].[n1] 100000 0.1216435 0.110157 11 0.2318005 [BVTOOLS].[dbo].[t_100k].[n1] NULL PLAN_ROW 0 1
|--Table Scan(OBJECT:([BVTOOLS].[dbo].[t_1m_pk])) 1 5 3 Table Scan Table Scan OBJECT:([BVTOOLS].[dbo].[t_1m_pk]) [BVTOOLS].[dbo].[t_1m_pk].[n1], [BVTOOLS].[dbo].[t_1m_pk].[c1] 1000000 1.741644 1.100157 16 2.8418 [BVTOOLS].[dbo].[t_1m_pk].[n1], [BVTOOLS].[dbo].[t_1m_pk].[c1] NULL PLAN_ROW 0 1
(100000 rows affected)
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_1m_pk'. Scan count 1, logical reads 2348, physical reads 0, read-ahead reads 845, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_100k'. Scan count 1, logical reads 161, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Oracle:
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 488K| 181K (1)| 00:00:29 |
| 1 | TABLE ACCESS BY INDEX ROWID| T_1M_PK | 1 | 7 | 2 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | PK_1M_PK | 1 | | 1 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | T_100K | 100K| 488K| 33 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T_1M_PK"."N1"=:B1)
consistent gets 100731
Conclusion: Although the primary key guarantees a single value in the subquery, Oracle didn’t unnest it, which in turn resulted in a suboptimal plan with much more consistent gets.
Summary
In summary, Oracle optimizer sometimes doesn’t unnest the subqueries in the SELECT clause. Therefore, in such cases, it might make sense to check the execution statistics and consider rewriting the query by using a semi or an outer join.
One Comment