Correlated Subqueries in the SELECT Clause (2)

In the previous installment, I described a case where Oracle optimizer doesn’t unnest the subquery in the SELECT clause, even when that would be a better option. Christian Antognini pointed out that this happens because the scalar subquery didn’t return a single value, which disqualified the subquery from the subquery unnesting transformation. Nevertheless, SQL Server managed to rewrite the same query to use a more efficient semi join instead, which resulted in orders of magnitude less logical reads.

That prompted me to test more such queries and explore the execution plans in both database products. This information can be used for manual optimization, as it’s often possible to rewrite a subquery in the SELECT clause as either a semi join or an outer join.

I tested on Oracle 18c and SQL Server 2017.

The non-indexed test tables t_1k and t_100k contain distinct integer values and are defined in my previous blog post.

Query 1 – EXISTS

select 
  case when exists ( 
    select  1 
	  from t_1k 
	  where t_1k.n1 = t_100k.n1 
  ) then 1 else 0 end
  from t_100k ; 

SQL Server:

StmtText	StmtId	NodeId	Parent	PhysicalOp	LogicalOp	Argument	DefinedValues	EstimateRows	EstimateIO	EstimateCPU	AvgRowSize	TotalSubtreeCost	OutputList	Warnings	Type	Parallel	EstimateExecutions
select     case when exists (       select  1      from t_1k      where t_1k.n1 = t_100k.n1     ) then 1 else 0 end    from t_100k ;	1	1	0	NULL	NULL	1	NULL	100000	NULL	NULL	NULL	7.925597	NULL	NULL	SELECT	0	NULL
  |--Compute Scalar(DEFINE:([Expr1007]=CASE WHEN [Expr1008] THEN (1) ELSE (0) END))	1	2	1	Compute Scalar	Compute Scalar	DEFINE:([Expr1007]=CASE WHEN [Expr1008] THEN (1) ELSE (0) END)	[Expr1007]=CASE WHEN [Expr1008] THEN (1) ELSE (0) END	100000	0	0.01	11	7.925597	[Expr1007]	NULL	PLAN_ROW	0	1
       |--Merge Join(Left Semi Join, MANY-TO-MANY MERGE:([BVTOOLS].[dbo].[t_100k].[n1])=([BVTOOLS].[dbo].[t_1k].[n1]), RESIDUAL:([BVTOOLS].[dbo].[t_1k].[n1]=[BVTOOLS].[dbo].[t_100k].[n1]))	1	3	2	Merge Join	Left Semi Join	MANY-TO-MANY MERGE:([BVTOOLS].[dbo].[t_100k].[n1])=([BVTOOLS].[dbo].[t_1k].[n1]), RESIDUAL:([BVTOOLS].[dbo].[t_1k].[n1]=[BVTOOLS].[dbo].[t_100k].[n1])	NULL	100000	0.000313	0.0336502	9	7.915597	[Expr1008]	NULL	PLAN_ROW	0	1
            |--Sort(ORDER BY:([BVTOOLS].[dbo].[t_100k].[n1] ASC))	1	4	3	Sort	Sort	ORDER BY:([BVTOOLS].[dbo].[t_100k].[n1] ASC)	NULL	100000	0.01126126	7.623925	11	7.866987	[BVTOOLS].[dbo].[t_100k].[n1]	NULL	PLAN_ROW	0	1
            |    |--Table Scan(OBJECT:([BVTOOLS].[dbo].[t_100k]))	1	5	4	Table Scan	Table Scan	OBJECT:([BVTOOLS].[dbo].[t_100k])	[BVTOOLS].[dbo].[t_100k].[n1]	100000	0.1216435	0.110157	11	0.2318005	[BVTOOLS].[dbo].[t_100k].[n1]	NULL	PLAN_ROW	0	1
            |--Sort(ORDER BY:([BVTOOLS].[dbo].[t_1k].[n1] ASC))	1	6	3	Sort	Sort	ORDER BY:([BVTOOLS].[dbo].[t_1k].[n1] ASC)	NULL	1	0.01126126	0.000100011	11	0.01464437	[BVTOOLS].[dbo].[t_1k].[n1]	NULL	PLAN_ROW	0	1
                 |--Table Scan(OBJECT:([BVTOOLS].[dbo].[t_1k]))	1	7	6	Table Scan	Table Scan	OBJECT:([BVTOOLS].[dbo].[t_1k])	[BVTOOLS].[dbo].[t_1k].[n1]	1	0.003125	0.0001581	11	0.0032831	[BVTOOLS].[dbo].[t_1k].[n1]	NULL	PLAN_ROW	0	1
(100000 rows affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_1k'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_100k'. Scan count 1, logical reads 161, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Oracle:

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |   100K|   488K|   356K  (1)| 00:00:56 |
|*  1 |  TABLE ACCESS FULL| T_1K   |     1 |     4 |     4   (0)| 00:00:01 |
|   2 |  TABLE ACCESS FULL| T_100K |   100K|   488K|    33   (0)| 00:00:01 |
----------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("T_1K"."N1"=:B1)
consistent gets	499163

Conclusion: Unlike Oracle, SQL Server rewrote the query to merge semi join and produced a more efficient execution plan.

Query 2 – MAX

The query below is a part of the demo script subquery_unnesting.sql from Christian Antognini’s book Troubleshooting Oracle Performance, 2nd Edition:

select 
  t_100k.*,
  (select max(t_1k.n1) from t_1k where t_1k.n1 = t_100k.n1) as max_id
from t_100k;

SQL Server:

StmtText	StmtId	NodeId	Parent	PhysicalOp	LogicalOp	Argument	DefinedValues	EstimateRows	EstimateIO	EstimateCPU	AvgRowSize	TotalSubtreeCost	OutputList	Warnings	Type	Parallel	EstimateExecutions
select     t_100k.*,    (select max(t_1k.n1) from t_1k where t_1k.n1 = t_100k.n1) as max_id  from t_100k;	1	1	0	NULL	NULL	1	NULL	100000	NULL	NULL	NULL	1.072298	NULL	NULL	SELECT	0	NULL
  |--Compute Scalar(DEFINE:([Expr1008]=[Expr1006]))	1	2	1	Compute Scalar	Compute Scalar	DEFINE:([Expr1008]=[Expr1006])	[Expr1008]=[Expr1006]	100000	0	0.01	15	1.072298	[BVTOOLS].[dbo].[t_100k].[n1], [Expr1008]	NULL	PLAN_ROW	0	1
       |--Hash Match(Right Outer Join, HASH:([BVTOOLS].[dbo].[t_1k].[n1])=([BVTOOLS].[dbo].[t_100k].[n1]), RESIDUAL:([BVTOOLS].[dbo].[t_1k].[n1]=[BVTOOLS].[dbo].[t_100k].[n1]))	1	3	2	Hash Match	Right Outer Join	HASH:([BVTOOLS].[dbo].[t_1k].[n1])=([BVTOOLS].[dbo].[t_100k].[n1]), RESIDUAL:([BVTOOLS].[dbo].[t_1k].[n1]=[BVTOOLS].[dbo].[t_100k].[n1])	NULL	100000	0	0.7973643	15	1.062298	[BVTOOLS].[dbo].[t_100k].[n1], [Expr1006]	NULL	PLAN_ROW	0	1
            |--Stream Aggregate(GROUP BY:([BVTOOLS].[dbo].[t_1k].[n1]) DEFINE:([Expr1006]=MAX([BVTOOLS].[dbo].[t_1k].[n1])))	1	4	3	Stream Aggregate	Aggregate	GROUP BY:([BVTOOLS].[dbo].[t_1k].[n1])	[Expr1006]=MAX([BVTOOLS].[dbo].[t_1k].[n1])	1000	0	0.0011	15	0.03313064	[BVTOOLS].[dbo].[t_1k].[n1], [Expr1006]	NULL	PLAN_ROW	0	1
            |    |--Sort(ORDER BY:([BVTOOLS].[dbo].[t_1k].[n1] ASC))	1	5	4	Sort	Sort	ORDER BY:([BVTOOLS].[dbo].[t_1k].[n1] ASC)	NULL	1000	0.01126126	0.01564663	11	0.03203064	[BVTOOLS].[dbo].[t_1k].[n1]	NULL	PLAN_ROW	0	1
            |         |--Table Scan(OBJECT:([BVTOOLS].[dbo].[t_1k]))	1	6	5	Table Scan	Table Scan	OBJECT:([BVTOOLS].[dbo].[t_1k])	[BVTOOLS].[dbo].[t_1k].[n1]	1000	0.003865741	0.001257	11	0.005122741	[BVTOOLS].[dbo].[t_1k].[n1]	NULL	PLAN_ROW	0	1
            |--Table Scan(OBJECT:([BVTOOLS].[dbo].[t_100k]))	1	7	3	Table Scan	Table Scan	OBJECT:([BVTOOLS].[dbo].[t_100k])	[BVTOOLS].[dbo].[t_100k].[n1]	100000	0.1216435	0.110157	11	0.2318005	[BVTOOLS].[dbo].[t_100k].[n1]	NULL	PLAN_ROW	0	1
(100000 rows affected)
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_100k'. Scan count 1, logical reads 161, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_1k'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Oracle:

----------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |   100K|  3027K|    39   (6)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT OUTER|          |   100K|  3027K|    39   (6)| 00:00:01 |
|   2 |   VIEW                | VW_SSQ_1 |  1000 | 26000 |     5  (20)| 00:00:01 |
|   3 |    HASH GROUP BY      |          |  1000 |  4000 |     5  (20)| 00:00:01 |
|   4 |     TABLE ACCESS FULL | T_1K     |  1000 |  4000 |     4   (0)| 00:00:01 |
|   5 |   TABLE ACCESS FULL   | T_100K   |   100K|   488K|    33   (0)| 00:00:01 |
----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("ITEM_1"(+)="T_100K"."N1")
consistent gets	168

Conclusion: Both database engines produced a good execution plan. Because of the MAX function, the optimizers recognized that the subquery returns a single value and then unnested the query.

Query 3 – DISTINCT

In the next query, I replaced MAX with DISTINCT which should also guarantee the single value.

select 
  t_100k.*,
  (select distinct t_1k.n1 from t_1k where t_1k.n1 = t_100k.n1) as s_n1
from t_100k;

SQL Server:

StmtText	StmtId	NodeId	Parent	PhysicalOp	LogicalOp	Argument	DefinedValues	EstimateRows	EstimateIO	EstimateCPU	AvgRowSize	TotalSubtreeCost	OutputList	Warnings	Type	Parallel	EstimateExecutions
select     t_100k.*,    (select distinct t_1k.n1 from t_1k where t_1k.n1 = t_100k.n1) as s_n1  from t_100k;	1	1	0	NULL	NULL	1	NULL	100000	NULL	NULL	NULL	1.071298	NULL	NULL	SELECT	0	NULL
  |--Compute Scalar(DEFINE:([Expr1007]=[BVTOOLS].[dbo].[t_1k].[n1]))	1	2	1	Compute Scalar	Compute Scalar	DEFINE:([Expr1007]=[BVTOOLS].[dbo].[t_1k].[n1])	[Expr1007]=[BVTOOLS].[dbo].[t_1k].[n1]	100000	0	0.01	15	1.071298	[BVTOOLS].[dbo].[t_100k].[n1], [Expr1007]	NULL	PLAN_ROW	0	1
       |--Hash Match(Right Outer Join, HASH:([BVTOOLS].[dbo].[t_1k].[n1])=([BVTOOLS].[dbo].[t_100k].[n1]), RESIDUAL:([BVTOOLS].[dbo].[t_1k].[n1]=[BVTOOLS].[dbo].[t_100k].[n1]))	1	3	2	Hash Match	Right Outer Join	HASH:([BVTOOLS].[dbo].[t_1k].[n1])=([BVTOOLS].[dbo].[t_100k].[n1]), RESIDUAL:([BVTOOLS].[dbo].[t_1k].[n1]=[BVTOOLS].[dbo].[t_100k].[n1])	NULL	100000	0	0.7973643	15	1.061298	[BVTOOLS].[dbo].[t_100k].[n1], [BVTOOLS].[dbo].[t_1k].[n1]	NULL	PLAN_ROW	0	1
            |--Sort(DISTINCT ORDER BY:([BVTOOLS].[dbo].[t_1k].[n1] ASC))	1	5	3	Sort	Distinct Sort	DISTINCT ORDER BY:([BVTOOLS].[dbo].[t_1k].[n1] ASC)	NULL	1000	0.01126126	0.01564664	15	0.03213064	[BVTOOLS].[dbo].[t_1k].[n1]	NULL	PLAN_ROW	0	1
            |    |--Compute Scalar(DEFINE:([Expr1006]=[BVTOOLS].[dbo].[t_1k].[n1]))	1	6	5	Compute Scalar	Compute Scalar	DEFINE:([Expr1006]=[BVTOOLS].[dbo].[t_1k].[n1])	[Expr1006]=[BVTOOLS].[dbo].[t_1k].[n1]	1000	0	0.0001	15	0.005222741	[BVTOOLS].[dbo].[t_1k].[n1], [Expr1006]	NULL	PLAN_ROW	0	1
            |         |--Table Scan(OBJECT:([BVTOOLS].[dbo].[t_1k]))	1	7	6	Table Scan	Table Scan	OBJECT:([BVTOOLS].[dbo].[t_1k])	[BVTOOLS].[dbo].[t_1k].[n1]	1000	0.003865741	0.001257	11	0.005122741	[BVTOOLS].[dbo].[t_1k].[n1]	NULL	PLAN_ROW	0	1
            |--Table Scan(OBJECT:([BVTOOLS].[dbo].[t_100k]))	1	8	3	Table Scan	Table Scan	OBJECT:([BVTOOLS].[dbo].[t_100k])	[BVTOOLS].[dbo].[t_100k].[n1]	100000	0.1216435	0.110157	11	0.2318005	[BVTOOLS].[dbo].[t_100k].[n1]	NULL	PLAN_ROW	0	1
(100000 rows affected)
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_100k'. Scan count 1, logical reads 161, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_1k'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Oracle:

-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |   100K|   488K|   464K (21)| 00:01:13 |
|   1 |  SORT UNIQUE NOSORT|        |     1 |     4 |     5  (20)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T_1K   |     1 |     4 |     4   (0)| 00:00:01 |
|   3 |  TABLE ACCESS FULL | T_100K |   100K|   488K|    33   (0)| 00:00:01 |
-----------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter("T_1K"."N1"=:B1)
consistent gets	500163

Conclusion: Although DISTINCT guarantees the uniqueness and, thus, the scalar subquery returns a single result per driving row, Oracle optimizer didn’t unnest the subquery, which resulted in a suboptimal execution plan.

Query 4 – primary key

In this example I’ll be using a large table with a primary key:

SQL Server:

create table t_1m_pk (
  n1 integer, c1 nvarchar(100), 
  constraint pk_t_1m primary key nonclustered (n1) 
) ;

insert into t_1m_pk
SELECT TOP (1000000) 
  n1 = CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id])),
  'A'
FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
OPTION (MAXDOP 1);

Oracle:

create table t_1m_pk ( 
  n1 integer, c1 varchar2(1), 
  constraint pk_1m_pk primary key (n1) 
) ;

insert into t_1m_pk
  select level , 'A'
    from dual
    connect by level <= 1000000;
    
commit ;

begin
  dbms_stats.gather_table_stats ( null, 'T_1M_PK') ;
end ;
/

Query:

select 
  t_100k.*,
  (select t_1m_pk.c1 from t_1m_pk where t_1m_pk.n1 = t_100k.n1) as s_n1
from t_100k;

SQL Server:

StmtText	StmtId	NodeId	Parent	PhysicalOp	LogicalOp	Argument	DefinedValues	EstimateRows	EstimateIO	EstimateCPU	AvgRowSize	TotalSubtreeCost	OutputList	Warnings	Type	Parallel	EstimateExecutions
select     t_100k.*,    (select t_1m_pk.c1 from t_1m_pk where t_1m_pk.n1 = t_100k.n1) as s_n1  from t_100k;	1	1	0	NULL	NULL	1	NULL	100000	NULL	NULL	NULL	11.14075	NULL	NULL	SELECT	0	NULL
  |--Compute Scalar(DEFINE:([Expr1007]=[BVTOOLS].[dbo].[t_1m_pk].[c1]))	1	2	1	Compute Scalar	Compute Scalar	DEFINE:([Expr1007]=[BVTOOLS].[dbo].[t_1m_pk].[c1])	[Expr1007]=[BVTOOLS].[dbo].[t_1m_pk].[c1]	100000	0	0.01	16	11.14075	[BVTOOLS].[dbo].[t_100k].[n1], [Expr1007]	NULL	PLAN_ROW	0	1
       |--Hash Match(Left Outer Join, HASH:([BVTOOLS].[dbo].[t_100k].[n1])=([BVTOOLS].[dbo].[t_1m_pk].[n1]), RESIDUAL:([BVTOOLS].[dbo].[t_1m_pk].[n1]=[BVTOOLS].[dbo].[t_100k].[n1]))	1	3	2	Hash Match	Left Outer Join	HASH:([BVTOOLS].[dbo].[t_100k].[n1])=([BVTOOLS].[dbo].[t_1m_pk].[n1]), RESIDUAL:([BVTOOLS].[dbo].[t_1m_pk].[n1]=[BVTOOLS].[dbo].[t_100k].[n1])	NULL	100000	0	8.057143	16	11.13075	[BVTOOLS].[dbo].[t_100k].[n1], [BVTOOLS].[dbo].[t_1m_pk].[c1]	NULL	PLAN_ROW	0	1
            |--Table Scan(OBJECT:([BVTOOLS].[dbo].[t_100k]))	1	4	3	Table Scan	Table Scan	OBJECT:([BVTOOLS].[dbo].[t_100k])	[BVTOOLS].[dbo].[t_100k].[n1]	100000	0.1216435	0.110157	11	0.2318005	[BVTOOLS].[dbo].[t_100k].[n1]	NULL	PLAN_ROW	0	1
            |--Table Scan(OBJECT:([BVTOOLS].[dbo].[t_1m_pk]))	1	5	3	Table Scan	Table Scan	OBJECT:([BVTOOLS].[dbo].[t_1m_pk])	[BVTOOLS].[dbo].[t_1m_pk].[n1], [BVTOOLS].[dbo].[t_1m_pk].[c1]	1000000	1.741644	1.100157	16	2.8418	[BVTOOLS].[dbo].[t_1m_pk].[n1], [BVTOOLS].[dbo].[t_1m_pk].[c1]	NULL	PLAN_ROW	0	1
(100000 rows affected)
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_1m_pk'. Scan count 1, logical reads 2348, physical reads 0, read-ahead reads 845, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_100k'. Scan count 1, logical reads 161, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Oracle:

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |   100K|   488K|   181K  (1)| 00:00:29 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T_1M_PK  |     1 |     7 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | PK_1M_PK |     1 |       |     1   (0)| 00:00:01 |
|   3 |  TABLE ACCESS FULL          | T_100K   |   100K|   488K|    33   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("T_1M_PK"."N1"=:B1)
consistent gets	100731

Conclusion: Although the primary key guarantees a single value in the subquery, Oracle didn’t unnest it, which in turn resulted in a suboptimal plan with much more consistent gets.

Summary

In summary, Oracle optimizer sometimes doesn’t unnest the subqueries in the SELECT clause. Therefore, in such cases, it might make sense to check the execution statistics and consider rewriting the query by using a semi or an outer join.

Thanks for sharing

Nenad Noveljic

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.