In this blog post I’ll be exploring the behaviour of the Oracle function qcplgte and describing edge conditions that could lead to a memory leak during query parsing. The analysis was conducted on a Oracle 12.2 database and Solaris 11.
qcplgte
qcplgte is one of the auxiliary functions in Oracle database that underpin SQL parsing. In particular, this function divides the SQL text into several parts. In Oracle 12.2, the function receives a pointer through the second argument. The address to the next part of the SQL string to parse is stored at the 8 bytes offset. After parsing the portion of the SQL text, the function will update the same memory location with the pointer to the substring for the next parsing stage.
The following picture shows how the function finds the SQL text to parse.
After having figured out the input, it is fairly easy to come up with gdb commands which will display all of the parsing stages:
break qcplgte
set pagination off
commands 1
silent
x/s *(uint64_t *)($rsi+0x8)
continue
end
Here’s the short explanation of the commands above: According to x64 calling convention for System V the second parameter is passed through the %rsi register. The pointer to the SQL text is stored in the memory location %rsi+8. The OS is 64-bit, therefore casting to uint64_t when dereferencing %rsi+0x8. Finally, x/s will dereference the pointer to the (sub)string which is stored on the memory location %rsi+0x8.
Let’s trace the following simple query by using the gdb script above:
select a from t ;
where t is a very simple table created like this:
create table t (a number) ;
exec dbms_stats.gather_table_stats (USER,'T') ;
Note: Gathering statistics before running the query is important to avoid the noise which would be otherwise created by recursive dynamic sampling queries.
The excerpt from the gdb output:
0xffff80ffbfffb3c8: "select a from t"
0xffff80ffbfffb3ce: " a from t"
0xffff80ffbfffb3d0: " from t"
0xffff80ffbfffb3cf: "a from t"
0xffff80ffbfffb3d0: " from t"
0xffff80ffbfffb3cf: "a from t"
0xffff80ffbfffb3d0: " from t"
0xffff80ffbfffb3d5: " t"
0xffff80ffbfffb3d7: ""
0xffff80ffbfffb3d6: "t"
...
An interesting observation we can make by looking at the output above is that parsing is not linear. What I mean by that is the function had to examine some substrings, such as the one starting at the address 0xffff80ffbfffb3cf, twice even for such a simple query.
Footprint
Before making the initial query gradually more complex, we have to come up with a method for measuring the parsing footprint. For this reason, I wrote a DTrace script which collects the following information:
- number of qcplgte calls
- number of mmap calls
- allocated heap memory
#pragma D option quiet
pid$target::mmap:entry
/ (int)arg4 == -1 /
{
@bytes=sum(arg1);
@count_mmap=count();
}
pid$target::qcplgte:entry {
@count_qcplgte=count();
}
END {
printa("%@d Bytes \n", @bytes) ;
printa("%@d mmap calls \n", @count_mmap) ;
printa("%@d qcplgte calls \n", @count_qcplgte) ;
}
mmap is the Kernel function for mapping files into the memory:
void *mmap(void *addr, size_t len, int prot, int flags,
int fildes, off_t off);
A special case is when the file descriptor -1 (5th argument) is specified. In this case, mmap will just allocate an anon segment of the size specified with the second argument. This is how Oracle parsing module is allocating heap memory.
So, it’s time to check the footprint of our simple query:
23 qcplgte calls
Just 23 qcplgte calls were counted, not a single memory allocation was done.
Nonlinearity
Something interesting is going to happen if we enclose the selected column in couple of brackets :
select
case when
(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
a <=1
)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
then 10
else null
end
from t ;
The number of qcplgte calls went through the roof, but still no memory allocations were recorded:
1679092 qcplgte calls
Also, I added the “case when” just to amplify the nonlinearity effect.
Memory Allocations
Next, we are going to observe what is going to happen after we make the text of the query significantly bigger by embedding a huge comment.
select /* insert some huge comment here */
case when
(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
t.a <=1
)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
then 10
else null
end
from sys.t;
(You can download the query which I used in this test here.)
24117248 Bytes mmap
22 mmap calls
1678213 qcplgte calls
It can be seen that making the query text bigger led to memory allocations.
Besides that, I started to reference the table (in the from clause) fully qualified by adding the schema name. Because of that, the allocated memory doubled. This means, if we run the query above without specifying the schema name it will do only 11 mmap calls and allocate only half as much memory.
Finally, something unexpected will happen when we fully qualify the column in the select clause:
select /* insert some huge comment here */
case when
(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
sys.t.a <=1
)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
then 10
else null
end
from sys.t;
998506496 Bytes mmap
944 mmap calls
1679989 qcplgte calls
Even though the number of qcplgte calls remained the same, the allocated memory increased by 41 times after we fully qualified the column in the select.
Oracle 11.2.0.4
As opposed to Oracle 12c, referencing fully qualified column name hasn’t increased the memory consumption in Oracle 11.2. The same query allocates only 11 MB in Oracle 11.2:
12451840 Bytes mmap
8 mmap calls
949 qcplgte calls
Real World Example
Frankly, I don’t know why somebody would want to reference the columns name by specifying the schema name, but apparently there are some applications around that generate such queries. For instance, Josué Ribeiro constructed a reproducible test case which is consuming tens of gigabaytes of memory on a 12c database until it hits the ORA-04030 error. In contrast, the same SQL runs smoothly on the 11.2 release. Also here, removing the schema name has proven to be an effective workaround.
Summary
To sum up, qcplgte is an auxiliary parsing function whose purpose is to extract the portion of the SQL text which will be parsed next. By observing pointer values it can be concluded that the parsing process needs to revisit certain places over and over again. In doing so, it will have to allocate heap memory for large (in respect to the length of the SQL text) queries. The memory consumption dramatically increases when the selected columns are referenced fully qualified including schema name in a 12c database. Fortunately, it is enough to remove the schema name when referencing the columns in the select to bring memory allocations back to the normal levels.
Update August 20, 2020
Oracle fixed this problem in 18c:
12c | 18c | 19c | |
---|---|---|---|
allocated (MB) | 1979 | 35 | 24 |
# mmap calls | 1380 | 26 | 22 |
# qcplgte calls | 3359978 | 1679989 | 1679989 |
I tested with the following patch levels: 12.2.0.1.180116, 18.5.0.0.190115 and 19.7.0.0.200414.