Table Spool Performance Problem

Introduction

While exploring Nested Loop Join Heuristics, I noticed that the SQL Server Query Opimizer had introduced a Lazy Table Spool operator in a Nested Loop Join (NLJ). In particular, the Lazy Table Spool appeared in the inner table of a simple NLJ which didn’t have any additional predicates. Furthermore, the number of logical reads exploded due to the Table Spool operator.

Setting the Scene

To demonstrate this anomaly, I’ll create two test tables, one large with a milion records and one small with just two records:


SELECT TOP (1000000)
id = CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id])),
a = CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id]))
INTO t_large
FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
OPTION (MAXDOP 1);

create table t_small (b integer,c integer) ;
insert into t_mall values (1,1) ;
insert into t_mall values (2,2) ;

The query to populate the large table with numbers is used by courtesy of Aaron Bertrand.

The execution plans will be analyzed for the following query which simply joins both of the tables:

select a from t_small s join t_large l on l.id = s.b option (recompile)

Hash Join

As expected, the optimizer chose the Hash Join algorithm:


StmtText	TotalSubtreeCost
select a from t_small s join t_large l on l.id = s.b option (recompile)	10.49784
  |--Hash Match(Inner Join, HASH:([s].[b])=([l].[id]), RESIDUAL:([master].[dbo].[t_small].[b] as [s].[b]=[master].[dbo].[t_large].[id] as [l].[id]))	10.49784
       |--Table Scan(OBJECT:([master].[dbo].[t_small] AS [s]))	0.0032842
       |--Table Scan(OBJECT:([master].[dbo].[t_large] AS [l]))	3.94106

Nested Loop Join With Table Spool

However, if we enforce a NLJ with the hint, Query Optimizer will introduce the Lazy Table Spool of the inner table resulting in the total estimated cost 13.27454.


StmtText	PhysicalOp	LogicalOp	EstimateRows	EstimateIO	EstimateCPU	TotalSubtreeCost	EstimateExecutions
select a from t_small s join t_large l on l.id = s.b option (loop join, recompile)	NULL	NULL	2.153138	NULL	NULL	13.27454	NULL
  |--Nested Loops(Inner Join, WHERE:([master].[dbo].[t_small].[b] as [s].[b]=[master].[dbo].[t_large].[id] as [l].[id]))	Nested Loops	Inner Join	2.153138	0	8.36	13.27454	1
       |--Table Scan(OBJECT:([master].[dbo].[t_small] AS [s]))	Table Scan	Table Scan	2	0.003125	0.0001592	0.0032842	1
       |--Table Spool	Table Spool	Lazy Spool	1000000	0.01	0.1801002	4.31126	2
            |--Table Scan(OBJECT:([master].[dbo].[t_large] AS [l]))	Table Scan	Table Scan	1000000	2.840903	1.100157	3.94106	1

Nested Loop Join Without Table Spool

Next, let’s check what will be the cost of the NLJ without the Table Spool:


StmtText	PhysicalOp	LogicalOp	EstimateRows	EstimateIO	EstimateCPU	TotalSubtreeCost	EstimateExecutions
select a from t_small s join t_large l on l.id = s.b option (loop join, recompile)	NULL	NULL	2.153138	NULL	NULL	14.00442	NULL
  |--Nested Loops(Inner Join, WHERE:([master].[dbo].[t_small].[b] as [s].[b]=[master].[dbo].[t_large].[id] as [l].[id]))	Nested Loops	Inner Join	2.153138	0	8.36	14.00442	1
       |--Table Scan(OBJECT:([master].[dbo].[t_small] AS [s]))	Table Scan	Table Scan	2	0.003125	0.0001592	0.0032842	1
       |--Table Scan(OBJECT:([master].[dbo].[t_large] AS [l]))	Table Scan	Table Scan	1000000	2.840981	1.100078	5.041138	2

Note: the execution plan without the Table Spool was produced by switching off the BuildSPool query transformation:

DBCC RULEOFF('BuildSpool')

Query optimizer estimated cost 14.00442 for the plan without the Table Spool.

Scrutinizing Table Spool

In this case, there aren’t any filter predicates on t_large, so the whole table will be spooled into the tempdb. Thus, I can’t see any reason, why the calculated cost of the plan with the Table Spool should be lower than that of the plan without the Table Spool, 13.27454 and 14.00442, respectively? Not only does the spooling into the working table in the tempdb introduce an IO overhead, but also why would the join with the working table in the tempdb be more efficient than the join with the regular table in a user database?

However, spooling would generally make sense in the cases where there are filter predicates on the inner table, because the Nested Loop would be executed only on a subset of data of the inner table that is materialized in the tempdb as a result of the Table Spool operator.

Execution Statistics

Finally, I’ll collect the IO and time statistics in order to measure by how much the cost estimation of Table Spool deviates from reality:

With Table Spool:


Table 't_large'. Scan count 1, logical reads 3832, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 1, logical reads 2864860, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 't_small'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 3171 ms,  elapsed time = 3233 ms.

Without Table Spool:


Table 't_large'. Scan count 1, logical reads 7664, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 't_small'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 235 ms,  elapsed time = 237 ms.

Obviously, something got entirely wrong with the Table Spool. As expected, the inner table t_large was scanned only once for creating the Worktable (3832 logical reads). However, the NLJ to the inner Worktable led to the devastating 2864860 logical reads, which could not be explained at all! As a consequence, the query execution time skyrocketed from 237 ms to 3233 ms.

In contrast, the statistics for the plan without the Table Spool were as expected – the inner table t_large was scanned exactly 7664 times; once for each of the two records of the outer table t_small.

Eliminating Table Spool

Please note, that the test case above has been engineered in a lab environment to show that there might be issues with the Table Spool. Hence, it might be worth paying closer attention when this operator appears in the execution plan. In this particular case, Table Spool can be completely eliminated by proper indexing:

CREATE NONCLUSTERED INDEX [idx_t_large_id]
ON [dbo].[t_large] ([id])
INCLUDE ([a])

StmtText TotalSubtreeCost
select a from t_small s join t_large l on l.id = s.b 0.00673376
  |--Nested Loops(Inner Join, OUTER REFERENCES:([s].[b])) 0.00673376
       |--Table Scan(OBJECT:([master].[dbo].[t_small] AS [s])) 0.0032842
       |--Index Seek(OBJECT:([master].[dbo].[t_large].[idx_t_large_id] AS [l]), SEEK:([l].[id]=[master].[dbo].[t_small].[b] as [s].[b]) ORDERED FORWARD) 0.0034412
Thanks for sharing

Nenad Noveljic

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.