Cartesian Join with Hierarchical Query

Cartesian join with a hierarchical query (typically on DUAL) seems to be a common way of generating duplicate rows in a row set. For example, the following SQL produces 1000 rows for each row in the underlying table T1.

select idx,n from t1 join 
  (select level idx from dual connect by level <= 1000) on 1=1 
  order by idx,n ;

For instance, if we insert 5 integers between 1001 and 1005 into the table T1, the output of the query above will look like follows:

create table t1 (n number) ;
insert into t1
  select level + 1000
    from dual connect by level < 5 ;
commit ;
exec dbms_stats.gather_table_stats(NULL,'T1');

       IDX          N
---------- ----------
         1       1001
         1       1002
         1       1003
...
      1000       1003
      1000       1004
      1000       1005

5000 rows selected.

Obviously, it isn't that difficult to duplicate rows in a row set. Nevertheless, I'd like to highlight a little problem with this approach. What I mean by that is that the estimated cardinalities are completely wrong in such case:

-----------------------------------------------------------------------------------------------------------------------------                                                                                   
| Id  | Operation                       | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |                                                                                   
-----------------------------------------------------------------------------------------------------------------------------                                                                                   
|   0 | SELECT STATEMENT                |      |      1 |        |   5000 |00:00:00.01 |       3 |       |       |          |                                                                                   
|   1 |  SORT ORDER BY                  |      |      1 |      5 |   5000 |00:00:00.01 |       3 |   160K|   160K|  142K (0)|                                                                                   
|   2 |   MERGE JOIN CARTESIAN          |      |      1 |      5 |   5000 |00:00:00.01 |       3 |       |       |          |                                                                                   
|   3 |    VIEW                         |      |      1 |      1 |   1000 |00:00:00.01 |       0 |       |       |          |                                                                                   
|   4 |     CONNECT BY WITHOUT FILTERING|      |      1 |        |   1000 |00:00:00.01 |       0 |  2048 |  2048 | 2048  (0)|                                                                                   
|   5 |      FAST DUAL                  |      |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |                                                                                   
|   6 |    BUFFER SORT                  |      |   1000 |      5 |   5000 |00:00:00.01 |       3 |  2048 |  2048 | 2048  (0)|                                                                                   
|   7 |     TABLE ACCESS FULL           | T1   |      1 |      5 |      5 |00:00:00.01 |       3 |       |       |          |                                                                                   
-----------------------------------------------------------------------------------------------------------------------------

As you can see, the optimizer assumed that SELECT FROM DUAL CONNECT BY was going to return just a single row. As a consequence, the estimated cardinality for the whole SELECT is wrong by orders of magnitude. Moreover, if this join was embedded in a larger query it could even cause the butterfly-effect on the entire cardinality calculation.

In the case that you just started thinking of adding the CARDINALITY(DUAL,1000) hint to the inline view, I could already tell you that this would work. Or should I better say - at least for this very simple example. Though, I don't recommend doing it because the hint might be "lost" during optimizer transformations which are being applied to more complex queries.

So I'd like to suggest a more robust solution instead. First, you should create a table containing just the sequence of integer values up to the number of rows you'd need to duplicate, like for example:

create table t_int (idx number) ;
insert into t_int
  select level 
    from dual connect by level <= 1000  ;
commit ;
exec dbms_stats.gather_table_stats(NULL,'T_INT');

And then you can replace DUAL with the newly created T_INT in the join:

select idx,n from t1 join t_int on 1=1 
  where idx <= 1000 
  order by idx,n ;

There you go. Since T_INT is a regular table with up-to-date statistics, the cardinalities are going to be correct now:

--------------------------------------------------------------------------------------------------------------------                                                                                            
| Id  | Operation             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |                                                                                            
--------------------------------------------------------------------------------------------------------------------                                                                                            
|   0 | SELECT STATEMENT      |       |      1 |        |   5000 |00:00:00.01 |       7 |       |       |          |                                                                                            
|   1 |  SORT ORDER BY        |       |      1 |   5000 |   5000 |00:00:00.01 |       7 |   232K|   232K|  206K (0)|                                                                                            
|   2 |   MERGE JOIN CARTESIAN|       |      1 |   5000 |   5000 |00:00:00.01 |       7 |       |       |          |                                                                                            
|   3 |    TABLE ACCESS FULL  | T1    |      1 |      5 |      5 |00:00:00.01 |       3 |       |       |          |                                                                                            
|   4 |    BUFFER SORT        |       |      5 |   1000 |   5000 |00:00:00.01 |       4 | 36864 | 36864 |32768  (0)|                                                                                            
|*  5 |     TABLE ACCESS FULL | T_INT |      1 |   1000 |   1000 |00:00:00.01 |       4 |       |       |          |                                                                                            
--------------------------------------------------------------------------------------------------------------------

Thanks for sharing

Nenad Noveljic

Leave a Reply Cancel reply