While analyzing a long running query on a SQL Server database I discovered a boundary condition which revealed a flaw in the cost calculation of Anti Semi Join. Consequently, the wrong cost led to a bad execution plan that caused the query to run “forever”. In this blog post I’ll provide a test case, an analysis and a possible workaround.
I’ll also show how the max server memory and NOT NULL constraint can influence the execution plan selection.
Setting the Scene
The test case is a simplified version of the mentioned real world scenario:
create table t1 (a integer) create table t2 ( a integer not null , CONSTRAINT [t2_pk] PRIMARY KEY NONCLUSTERED ( [a] ASC ) ) update statistics t1 with rowcount=206057632,pagecount=4121154 update statistics t2 with rowcount=2,pagecount=1
I tweaked the statistics to make the optimizer think that the table t1 is very large and t2 extremely small containing just two records.
Furthermore, the maximum SQL Server memory is set to 20000 MB (or less). Maybe it is not obvious at the moment, but later it will turn out that the max memory limit has a significant impact on the selection of the execution plan.
SELECT name, value FROM sys.configurations WHERE name like 'max server memory%' ORDER BY name max server memory (MB) 20000
Anti Semi Join
The following query selects all the records from t1 without corresponding values in t2:
SET SHOWPLAN_text on select count(a) from t1 where a not in (select a from t2) option (maxdop 1) |--Compute Scalar(DEFINE:([Expr1006]=CONVERT_IMPLICIT(int,[Expr1009],0))) |--Stream Aggregate(DEFINE:([Expr1009]=COUNT([t1].[a]))) |--Nested Loops(Left Anti Semi Join, WHERE:([t1].[a] IS NULL OR [t1].[a]=[t2].[a])) |--Table Scan(OBJECT:([t1])) |--Table Scan(OBJECT:([t2]))
The Anti Semi Join returns the inverted result of the highlighted predicate of the Nested Loop. Since it contains an OR operation, neither Merge nor Hash Match, which would be a better option in this case, could be considered as alternatives.
As for the next step, let’s see what happens to the execution plan if we “make” the table t2 much larger, again, by manipulating its statistics:
SET SHOWPLAN_text off update statistics t2 with rowcount=1320764,pagecount=146753 |--Compute Scalar(DEFINE:([Expr1006]=CONVERT_IMPLICIT(int,[Expr1009],0))) 26060.84 |--Stream Aggregate(DEFINE:([Expr1009]=COUNT([t1].[a]))) 26060.84 |--Hash Match(Right Anti Semi Join, HASH:([t2].[a])=([t1].[a]), RESIDUAL:([t1].[a]=[t2].[a])) 26060.83 |--Index Scan(OBJECT:([t2].[t2_pk])) 1.456122 |--Nested Loops(Left Anti Semi Join, WHERE:([t1].[a] IS NULL)) 24828.88 |--Table Scan(OBJECT:([t1])) 3279.373 |--Row Count Spool 20626.37 |--Top(TOP EXPRESSION:((1))) 0.0032832 |--Index Scan(OBJECT:([t2].[t2_pk])) 0.0032831
I cut out all of non-relevant information from the SHOWPLAN_ALL output above leaving only the total cost of the subtree at the end of each line.
For the high number of records in the inner table t2 the optimizer has decided to replace the Nested Loop with a more efficient Hash Match which led to a more complex plan; namely, because of the OR operator, the IS NULL condition must be checked in a separate branch now, i.e. Nested Loop Anti Semi Join.
At this point, we’ll also make a mental note of the Row Count Spool operation and its cost. Row Count Spool scans its input, counts the number of rows for a given key, and caches the result making it generally a more efficient operation for checking the existence of a row than Table Scan (see the blog post by Fabiano Amorim for understanding the Row Count Spool operation in more detail).
Max Server Memory Impact
As we keep increasing memory nothing really happens – both the plan and the calculated cost remain the same. However, this is only true until a certain threshold is reached.
Interestingly, setting the max server memory to 30000 MB will cause a change in the execution plan:
USE master EXEC sp_configure 'max server memory (MB)', 30000 RECONFIGURE WITH OVERRIDE GO Configuration option 'max server memory (MB)' changed from 20000 to 30000. Run the RECONFIGURE statement to install. |--Compute Scalar(DEFINE:([Expr1006]=CONVERT_IMPLICIT(int,[Expr1009],0))) 21857.26 |--Stream Aggregate(DEFINE:([Expr1009]=COUNT([t1].[a]))) 21857.26 |--Hash Match(Right Anti Semi Join, HASH:([t2].[a])=([t1].[a]), RESIDUAL:([t1].[a]=[t2].[a])) 21857.25 |--Index Scan(OBJECT:([t2].[t2_pk])) 1.456122 |--Nested Loops(Left Anti Semi Join, WHERE:([t1].[a] IS NULL)) 20625.31 |--Table Scan(OBJECT:([t1])) 3279.373 |--Top(TOP EXPRESSION:((1))) 16422.8 |--Table Scan(OBJECT:([t2])) 16402.19
The Row Count Spool has been replaced by a Table Scan, because the cost of the Table Scan amounts to 16402.19 and is therefore considered by the optimizer to be more efficient than the Row Count Spool which has the cost 20626.37. By changing the maximum server memory and observing the estimated cost, it can be easily verified that the cost is not a function of the memory size for both Row Count Spool and Table Scan.
But then again, if the cost calculation doesn’t depend on the memory limit, why was the optimizer ignoring the plan with the Table Scan inspite of its lower cost until the memory was increased? I couldn’t find any hints that would explain the observed behavior. Possibly, the optimizer speculates that with less memory it would be more probable that some pages of the t2 table get evicted from the cache during the Nested Loop operation so it wouldn’t consider the Table Scan unless there is sufficient memory around to keep the whole t2 cached during the execution.
However, in reality, the query runs with the plan with the Row Count Spool far more faster than with the plan with the Table Scan. The elapsed times are the half of a minute and “forever”, respectively.
Table Scan Cost
So why has the optimizer massively underestimated the amount of work associated with the Table Scan?
To answer this question, I’ll change the size of the t2 table and observe the estimated cost. In general, the total operation cost of the Table Scan within the Nested Loop is the product of the number of records in the outer source and the CPU cost for a single Table Scan plus some additional base cost. Curiously, in this particular case the cost is not a function of the rowcount of the inner table:
update statistics t2 with rowcount=1000,pagecount=100 |--Table Scan(OBJECT:([t2])) 16402.19 update statistics t2 with rowcount=1000000,pagecount=100000 |--Table Scan(OBJECT:([t2])) 16402.19
Even after changing the size of the table by orders of magnitude, the cost hasn’t even slightly changed! As a consequence, the cost of the Table Scan of large tables will be massively underestimated misleading the optimizer to arrive at a very bad plan.
To NULL or NOT to NULL – That is the Question
Hence, we ended up with a paradox – in order to improve page life expectancy and overall performance we increased the server memory, but because of the flaw in the cost calculation one query started to run significantly slower after the memory adjustment. I started to look for possible workarounds. An easy way out would have been to rewrite the query to use alternative join methods for returning the same result set, but I strived for a solution which would fix all of the queries following the same pattern without touching the code.
Provided we think of the definition of the problem once again, we will quickly come to conclusion that the whole complexity has arisen due to the fact that the optimizer has to create a new branch in the execution plan just to handle the NULL values of the table t1. If t1.a was mandatory, we could declare the column as NOT NULL giving the optimizer a cue not to bother dealing with the NULL records, which in turn would significantly reduce the realm of execution plan candidates to choose from, thus making the scenario of hitting an optimizer bug under some boundary conditions less probable.
Although the column was never NULL on my live system, it was not declared as such. Actually, in the whole schema the application vendor avoided using NOT NULL for no obvious reason.
Therefore, it came as no surprise to me that altering the column t1.a to NOT NULL resulted in a much more efficient and less volatile execution plan:
alter table t1 alter column a integer not null |--Compute Scalar(DEFINE:([Expr1006]=CONVERT_IMPLICIT(int,[Expr1009],0))) 4517.912 |--Stream Aggregate(DEFINE:([Expr1009]=Count(*))) 4517.912 |--Hash Match(Right Anti Semi Join, HASH:([t2].[a])=([t1].[a])) 4394.278 |--Index Scan(OBJECT:([t2].[t2_pk])) 1.456122 |--Table Scan(OBJECT:([t1])) 3279.373
This plan has a much lower cost than the one which was devised when NOT NULL was not specified (4517.912 and 21857.26, respectively).
In conclusion, not only does the declaration of the column as NOT NULL (if appropriate for a given case) ensure data integrity, but also it provides the optimizer with invaluable information leading to far better and more stable execution plans.
The problem is reproducible in the SQL Server versions 2008R2 and 2014. In contrast, it seems to be fixed in the versions 2012 and 2016.
Due to a software bug the optimizer can devise a bad plan under following conditions:
- anti semi join is used
- NULL values are allowed in the joining column
- max server memory is large( > app. 30000M)
The problem can be avoided by declaring the columns as NOT NULL (when feasible), which is a good practice anyway as this provides additional information to the optimizer. By using this information the optimizer is able to come up with much better and more stable execution plans. The good news is that the problem is fixed in the most recent version (currently 2016).