IS NULL vs. NVL
IS NULL is the SQL expression for selecting NULL values. Nevertheless, some developers abuse NVL for that purpose. In this blog post, I’ll layout the drawbacks of the approach with NVL, and also show how the cardinality estimate improved in Oracle 21c.
I’ll use a simple table T1 with the following properties:
- The table contains 110,000 rows.
- The column D of data type DATE contains 100,000 null values.
drop table t1 ;
create table t1 as select
rownum n,
trunc(sysdate) d
from dual connect by level <= 1e4 ;
insert into t1
select rownum + 1e4, null
from dual connect by level <= 1e5 ;
commit ;
exec dbms_stats.gather_table_stats(null, 'T1');
IS NULL predicate is the most reliable way to get a good cardinality estimate when selecting NULL values, for example:
select * from t1 where d is null ;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 683K| 47 (5)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T1 | 100K| 683K| 47 (5)| 00:00:01 |
--------------------------------------------------------------------------
The equivalent NVL expression for extracting NULL values is as follows:
select * from t1
where nvl(d, to_date('31.12.2200','dd.mm.yyyy')) = to_date('31.12.2200','dd.mm.yyyy') ;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 683K| 48 (7)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T1 | 100K| 683K| 48 (7)| 00:00:01 |
--------------------------------------------------------------------------
As long as the column D doesn’t contain the value ‘31.12.2200’, the NVL expression evaluates to TRUE only for NULL values. As we can see, optimizer recognized the intention behind the expression and produced a good cardinality estimate. Jonathan Lewis already described how optimizer successfully handles that case.
But using NVL for selecting NULL values has disadvantages. Firstly, the logical correctness of the NVL expression depends on the assumption that the hardcoded value will never become a regular value and inserted into the table. That, however, might change in the lifespan of the application. In that case, the developers will need to search for all the hardcoded values and rewrite them. Secondly, NVL can yield a subpar cardinality estimate.
Join with NVL
Let’s slightly modify the query above to select the rows where D is above an in-range threshold or NULL:
select * from t1 where d >= trunc(sysdate) or d is null ;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 689K| 47 (5)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T1 | 100K| 689K| 47 (5)| 00:00:01 |
--------------------------------------------------------------------------
As we can see, the equivalent query with NVL still gets a good cardinality estimate:
select * from t1 where nvl(d, to_date('31.12.2200','dd.mm.yyyy')) >= trunc(sysdate) ;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 110K| 751K| 52 (14)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T1 | 110K| 751K| 52 (14)| 00:00:01 |
--------------------------------------------------------------------------
But the estimate goes awry after joining the result set above with another table.
I’m creating the table T2, similar to T1, only without NULL values:
create table t2 as select
rownum n,
trunc(sysdate) d
from dual connect by level <= 1e5 ;
The following query with IS NULL returns 100K rows.
select * from t1, t2
where t1.n = t2.n and ( t2.d <= t1.d or t1.d is null );
The cardinality estimate is close to the actual number of rows:
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 91736 | 1791K| | 315 (2)| 00:00:01 |
|* 1 | HASH JOIN | | 91736 | 1791K| 2048K| 315 (2)| 00:00:01 |
| 2 | TABLE ACCESS FULL| T1 | 110K| 751K| | 46 (3)| 00:00:01 |
| 3 | TABLE ACCESS FULL| T2 | 100K| 1269K| | 48 (3)| 00:00:01 |
-----------------------------------------------------------------------------------
But the estimate is by one order of magnitude lower for the equivalent query with NVL:
select * from t1, t2
where t1.n = t2.n and t2.d <= nvl(t1.d,to_date('31.12.2200','dd.mm.yyyy')) ;
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9091 | 177K| | 315 (2)| 00:00:01 |
|* 1 | HASH JOIN | | 9091 | 177K| 2048K| 315 (2)| 00:00:01 |
| 2 | TABLE ACCESS FULL| T1 | 110K| 751K| | 46 (3)| 00:00:01 |
| 3 | TABLE ACCESS FULL| T2 | 100K| 1269K| | 48 (3)| 00:00:01 |
-----------------------------------------------------------------------------------
The execution plan above is from Oracle release 19c (tested on 19.7 and 19.14):
In the releae 21c (tested on 21.4), the cardinality calculation has improved. Actually, the estimate is identical to the one of the query with IS NULL:
select * from t1, t2
where t1.n = t2.n and t2.d <= nvl(t1.d,to_date('31.12.2200','dd.mm.yyyy')) ;
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 91736 | 1791K| | 317 (3)| 00:00:01 |
|* 1 | HASH JOIN | | 91736 | 1791K| 2048K| 317 (3)| 00:00:01 |
| 2 | TABLE ACCESS FULL| T1 | 110K| 751K| | 47 (5)| 00:00:01 |
| 3 | TABLE ACCESS FULL| T2 | 100K| 1269K| | 49 (5)| 00:00:01 |
-----------------------------------------------------------------------------------
Conclusion
In conclusion, IS NULL is a better way than NVL for filtering NULL values. The logic based on NVL can become incorrect as data changes. Besides that, cardinality estimates with IS NULL are generally more reliable. Although Oracle has been improving NVL cardinality estimates, you might hit a corner case where an improvement hasn’t been implemented yet.