Tabular Data Stream (TDS) Encryption Mechanics
TDS encryption can cause dramatic scalability problems in SQL Server 2014. As we shall see later, those performance bottlenecks don’t arise due to the CPU overhead inherent to encryption. Rather, it’s a consequence of how the task is being scheduled within the SQL Server.
First, let me summarize how TDS encryption works in SQL Server 2014, as explained here. A thread calls a Windows OS kernel function to encrypt a message. As this happens, SQL Server hands over the thread control to Windows.
This back and forth context switching causes delays in query execution, that are exposed through the wait event PREEMPTIVE_OS_ENCRYPTMESSAGE.
Fortunately, Microsoft improved the implementation in the SQL Server versions 2016 and 2017 – no context switching is performed in these releases.
Let me explain the methodology used for reaching the conclusions above.
Measurement
The following SQL batch generates lots of messages. At the end, it collects the following metrics: time elapsed, the number of context switches and PREEMPTIVE_OS_ENCRYPTMESSAGE wait statistics.
select @@spid
DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);
select @@VERSION
select encrypt_option from sys.dm_exec_connections where session_id = @@SPID
DECLARE @starttime datetime, @endtime datetime
DECLARE @cnt int = 0, @ts_start int ;
DECLARE @context_switches_before int, @context_switches_after int ;
SET @starttime =getdate() ;
select @context_switches_before = sum(context_switches_count)
from sys.dm_os_schedulers ;
--load
BEGIN
WHILE @cnt < 100000000
BEGIN
SET @cnt = @cnt + 1;
END;
END
select @context_switches_after = sum(context_switches_count)
from sys.dm_os_schedulers ;
select ( @context_switches_after - @context_switches_before ) context_switches ;
SET @endtime = GETDATE() ;
SELECT Datediff(s, @starttime, @endtime) elapsed_seconds
select * from sys.dm_os_wait_stats where wait_type = 'PREEMPTIVE_OS_ENCRYPTMESSAGE' ;
While the workload is running, we 're collecting the task's CPU time with the following script:
declare @spid int = 78, @i int = 1, @samples int = 120 ;
begin
while ( @i <= @samples )
begin
select cpu_time,total_elapsed_time, status, wait_type, wait_time,
scheduler_id
from sys.dm_exec_requests r
where session_id = @spid
waitfor delay '00:00:01'
set @i = @i + 1
end
end ;
A side note: it might seem easier - and more intuitive - to setup an Extended Events (XE) trace to collect the CPU time for the SQL batch. However, not only does the XE tracing provide an inaccurate information, but it also introduces a significant overhead, which would skew the test results. I elaborated on that in my previous blog post.
Results
I ran the tests on SQL Server 2014 and 2017 using both encrypted and non-encrypted connections. The following table clearly highlights the problem:
2014 enc | 2014 non-enc | 2017 enc | 2017 non-enc | |
---|---|---|---|---|
elapsed time | 99690 | 62127 | 66196 | 62442 |
CPU time | 84490 | 60764 | 65114 | 63165 |
PREEMPTIVE_OS_ENCRYPTMESSAGE | 28266 | 0 | 0 | 0 |
context switches | 1273235 | 1117 | 508 | 500 |
Legend:
2014 enc | SQL Server 2014 with encrpytion configured |
2014 non-enc | SQL Server 2014 without encrpytion configured |
2017 enc | SQL Server 2017 with encrpytion configured |
2017 non-enc | SQL Server 2017 without encrpytion configured |
elapsed time | elapsed time in ms |
CPU time | the time the SQL batch spent on CPU in ms |
PREEMPTIVE_OS_ENCRYPTMESSAGE | the total time waited on PREEMPTIVE_OS_ENCRYPTMESSAGE for the whole instance during the test in ms |
context switches | the total number of context switches for the whole instance during the test |
Summary
TDS encryption has a negligible performance impact in SQL Server 2017. In contrast, turning the encryption on can be detrimental to performance in SQL Server 2014, especially when lots of messages are being sent to the clients. This performance decrease isn't caused by the encryption itself - it's rather a consequence of a suboptimal thread handling.