This blog post is a short observation on assumptions that Oracle and SQL server optimizers make to estimate the cardinality of a rowset obtained after applying an equality predicate which contains different columns of the same table, for example:
select count(*) from t where n1 = n2 ;
The demo is set up as follows:
- The table t has 100,000 rows.
- The column n1 has 5 distinct values.
- The column n2 has 10,000 distinct values.
- The data distribution in both columns is uniform.
- The most recent versions of both database products are used: Oracle 12.2 and SQL Server 2016.
The conclusions are consistent over various data sets.
Oracle
drop table t ;
create table t (n1 number , n2 number ) ;
insert into t
SELECT mod(level,5),mod(level,10000)
FROM dual
CONNECT BY level <= 100000 ;
exec dbms_stats.gather_table_stats( null , 'T' ) ;
column column_name format a3
select column_name,num_distinct ndv,density
from user_tab_columns where table_name='T' ;
COL NDV DENSITY
--- ---------- ----------
N1 5 .2
N2 10000 .0001
select count(*) from t where n1 = n2 ;
COUNT(*)
----------
50
SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 23 (100)| |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
|* 2 | TABLE ACCESS FULL| T | 10 | 70 | 23 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("N1"="N2")
Oracle treats the less selective column (n1) as a constant and uses the density of the more selective column (n2) for the cardinality estimation.
SQL Server
drop table t ;
create table t ( n1 integer , n2 integer ) ;
insert into t select top(100000)
CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id])) % 5 ,
CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id])) % 10000
FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
OPTION (MAXDOP 1);
set showplan_all on
select count(*) from t where n1 = n2 ;
EstimateRows StmtText
1 select count(*) from t where n1 = n2 ;
1 |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1004],0)))
1 |--Stream Aggregate(DEFINE:([Expr1004]=Count(*)))
10000 |--Table Scan(OBJECT:([master].[dbo].[t]), WHERE:([master].[dbo].[t].[n1]=[master].[dbo].[t].[n2]))
SQL Server makes the assumption that the filtered rowset will contain 10% rows of the table.