Join Permutations Heatmaps - All-round Database Topics

There are n! possible ways to order n tables in a join. Of course, the optimizer won’t cost all of the permutations to find the best plan. It will rather limit its search space in order to produce a good enough plan within a reasonable amount of time.

The considered permutations are tracked in the CBO trace, like for example:

...
Join order[3]:  T1[T1]#0  T2[T2]#1  T3[T3]#2  T4[T4]#3  T5[T5]#4  T6[T6]#5  T7[T7]#6  T8[T8]#8  T8[T8]#7  T10[T10]#9
...

Generally, the tables with lower cardinality tend to be joined first (“Considering cardinality-based initial join order.”)

I’d like to share the Python script for aggregating and visualizing the number of times a table was considered on each position in the join. Notice that the scripts relies on some non-default Python modules: pandas, seaborn and numpy.

I wrote it, at first, for experimental purposes. But then it turned out to be useful for explaining why sometimes a plan with a lower cost wasn’t chosen. In such case, you can generate the heatmap and check whether the leading tables from the better plan were even considered as leading tables in the suboptimal plan. By the way, that’s exactly what can happen with some query transformations.

To demonstrate the usage, I’ll create 10 tables – t1 having the lowest cardinality and t10 the highest.

declare
  i integer ;
begin
  for i in 1..10
  loop
    execute immediate
      'create table t' || i ||
      ' as select level n from dual connect by level <= ' || power( 4, i) 
  ;
  end loop ;
end ;
/

Then I’ll generate the CBO trace for the cartesian join of these 10 tables:

alter session set tracefile_identifier = 'JOIN_ORDER' ;

ALTER SESSION SET EVENTS='10053 trace name context forever, level 1' ;

explain plan for select t10.* from t1,t2,t3,t4,t5,t6,t7,t8,t8,t10 ;

ALTER SESSION SET EVENTS '10053 trace name context off';

The simplest way to produce the heatmap is as follows:

python join_permutations_heatmaps.py CBO_trace_file

By the way, in the graph above, we can clearly see that the tables with fewer rows are the ones mainly considered for the leading positions.

If you’re an “ASCII type of person” you can specify the –text parameter to get the output in the text format:

python join_permutations_heatmaps.py --text CBO_trace_file

            1     2     3    4    5    6    7    8    9    10
T1[T1]    1031    62     0    0    0    0   12   16   12   22
T2[T2]      62  1031     0    0    0    0   14   16   16   16
T3[T3]       0     0  1093    0    0    0   24   12   18    8
T4[T4]       0     0     0  813  120   68   36   40   39   39
T5[T5]       0     0     0  280  213  162  126  126  124  124
T6[T6]       0     0     0    0  302  232  168  153  150  150
T7[T7]       0     0     0   62  160  189  186  186  186  186
T8[T8]      31    31    62    0  240  336  395  404  406  405
T8[T8]      31    31    62    0  240  336  395  404  406  405
T10[T10]    31    31     0    0  120  168  194  202  204  205

Last but not least, complex queries may contain many query blocks, each of which is costed separately. By default, the heatmaps will be generated for all of them. However, you can limit the scope by specifiying a query block to focus on: –qb QB.

Thanks for sharing

2 Comments


June 22, 2019
Kyle Hailey

This looks cool!

Interesting / strange that the bigger tables were considered for the first position join T8[T8] and T10[T10] while 3-7 weren’t (is it a typo that T8[T8] is there twice ?).

Kyle
- Reply
  
  June 22, 2019
  Nenad Noveljic
  
  Thank you, Kyle, for your feedback.
  Well spotted – T8 appears twice in the heatmap, because I inadvertently referenced it twice (and omitted T9) in the query:
  explain plan for select t10.* from t1,t2,t3,t4,t5,t6,t7,t8,t8,t10 ;
  Unlike T10, T3 indeed never appeared in the first position. A possible explanation could be that there’s a heuristic rule to allocate a relatively small number of permutations where the biggest tables are being pushed away from the tail toward the first positions. A rationale – I’m just speculating – might be to examine the search subspace in which is, generally, less probable, though not impossible, to find an optimal plan.

Nenad Noveljic

2 Comments

Leave a Reply Cancel reply