Comparison of a Cardinality Estimate in Oracle and SQL Server

This blog post is a short observation on assumptions that Oracle and SQL server optimizers make to estimate the cardinality of a rowset obtained after applying an equality predicate which contains different columns of the same table, for example:

select count(*) from t where n1 = n2 ;

The demo is set up as follows:

  • The table t has 100,000 rows.
  • The column n1 has 5 distinct values.
  • The column n2 has 10,000 distinct values.
  • The data distribution in both columns is uniform.
  • The most recent versions of both database products are used: Oracle 12.2 and SQL Server 2016.

The conclusions are consistent over various data sets.

Oracle

drop table t ;
create table t (n1 number , n2 number ) ; 

insert into t
  SELECT mod(level,5),mod(level,10000)
  FROM   dual
  CONNECT BY level <= 100000 ;

exec dbms_stats.gather_table_stats( null , 'T' ) ;

column column_name format a3
select column_name,num_distinct ndv,density 
  from user_tab_columns where table_name='T' ;
COL        NDV    DENSITY
--- ---------- ----------
N1           5         .2
N2       10000      .0001

select count(*) from t where n1 = n2 ;
  COUNT(*)
----------
        50

SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR);

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    23 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     7 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |    10 |    70 |    23   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("N1"="N2")

Oracle treats the less selective column (n1) as a constant and uses the density of the more selective column (n2) for the cardinality estimation.

SQL Server

drop table t ;
create table t ( n1 integer , n2 integer ) ;

insert into t select top(100000)
  CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id])) % 5 ,
  CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id])) % 10000
  FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
OPTION (MAXDOP 1);

set showplan_all on
select count(*) from t where n1 = n2 ;

EstimateRows  StmtText
1	      select count(*) from t where n1 = n2 ;
1	        |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1004],0)))
1	          |--Stream Aggregate(DEFINE:([Expr1004]=Count(*)))
10000	           |--Table Scan(OBJECT:([master].[dbo].[t]), WHERE:([master].[dbo].[t].[n1]=[master].[dbo].[t].[n2]))

SQL Server makes the assumption that the filtered rowset will contain 10% rows of the table.

Thanks for sharing

Nenad Noveljic

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.