Memory Leak During Parsing

In this blog post I’ll be exploring the behaviour of the Oracle function qcplgte and describing edge conditions that could lead to a memory leak during query parsing. The analysis was conducted on a Oracle 12.2 database and Solaris 11.

qcplgte

qcplgte is one of the auxiliary functions in Oracle database that underpin SQL parsing. In particular, this function divides the SQL text into several parts. In Oracle 12.2, the function receives a pointer through the second argument. The address to the next part of the SQL string to parse is stored at the 8 bytes offset. After parsing the portion of the SQL text, the function will update the same memory location with the pointer to the substring for the next parsing stage.

The following picture shows how the function finds the SQL text to parse.

qcplgte

qcplgte

After having figured out the input, it is fairly easy to come up with gdb commands which will display all of the parsing stages:

break qcplgte
set pagination off
commands 1
silent
x/s *(uint64_t *)($rsi+0x8)
continue
end

Here’s the short explanation of the commands above: According to x64 calling convention for System V the second parameter is passed through the %rsi register. The pointer to the SQL text is stored in the memory location %rsi+8. The OS is 64-bit, therefore casting to uint64_t when dereferencing %rsi+0x8. Finally, x/s will dereference the pointer to the (sub)string which is stored on the memory location %rsi+0x8.

Let’s trace the following simple query by using the gdb script above:

select a from t ;

where t is a very simple table created like this:

create table t (a number) ;
exec dbms_stats.gather_table_stats (USER,'T') ;

Note: Gathering statistics before running the query is important to avoid the noise which would be otherwise created by recursive dynamic sampling queries.

The excerpt from the gdb output:

0xffff80ffbfffb3c8:     "select a from t"
0xffff80ffbfffb3ce:     " a from t"
0xffff80ffbfffb3d0:     " from t"
0xffff80ffbfffb3cf:     "a from t"
0xffff80ffbfffb3d0:     " from t"
0xffff80ffbfffb3cf:     "a from t"
0xffff80ffbfffb3d0:     " from t"
0xffff80ffbfffb3d5:     " t"
0xffff80ffbfffb3d7:     ""
0xffff80ffbfffb3d6:     "t"
...

An interesting observation we can make by looking at the output above is that parsing is not linear. What I mean by that is the function had to examine some substrings, such as the one starting at the address 0xffff80ffbfffb3cf, twice even for such a simple query.

Footprint

Before making the initial query gradually more complex, we have to come up with a method for measuring the parsing footprint. For this reason, I wrote a DTrace script which collects the following information:

  • number of qcplgte calls
  • number of mmap calls
  • allocated heap memory
#pragma D option quiet

pid$target::mmap:entry 
/ (int)arg4 == -1 /
{
  @bytes=sum(arg1);
  @count_mmap=count();
}

pid$target::qcplgte:entry {
  @count_qcplgte=count();
}

END {
  printa("%@d Bytes \n", @bytes) ;
  printa("%@d mmap calls \n", @count_mmap) ;
  printa("%@d qcplgte calls \n", @count_qcplgte) ;
}

mmap is the Kernel function for mapping files into the memory:

void *mmap(void *addr, size_t len, int prot, int flags,
            int fildes, off_t off);

A special case is when the file descriptor -1 (5th argument) is specified. In this case, mmap will just allocate an anon segment of the size specified with the second argument. This is how Oracle parsing module is allocating heap memory.

So, it’s time to check the footprint of our simple query:

23 qcplgte calls

Just 23 qcplgte calls were counted, not a single memory allocation was done.

Nonlinearity

Something interesting is going to happen if we enclose the selected column in couple of brackets :

select 
  case when 
  (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
  a <=1
  )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
  then 10
  else null 
  end
from t ;

The number of qcplgte calls went through the roof, but still no memory allocations were recorded:

1679092 qcplgte calls

Also, I added the “case when” just to amplify the nonlinearity effect.

Memory Allocations

Next, we are going to observe what is going to happen after we make the text of the query significantly bigger by embedding a huge comment.

select /* insert some huge comment here */
  case when 
  (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
  t.a <=1
  )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
  then 10
  else null 
  end
  from sys.t;

(You can download the query which I used in this test here.)

24117248 Bytes mmap 
22 mmap calls
1678213 qcplgte calls

It can be seen that making the query text bigger led to memory allocations.

Besides that, I started to reference the table (in the from clause) fully qualified by adding the schema name. Because of that, the allocated memory doubled. This means, if we run the query above without specifying the schema name it will do only 11 mmap calls and allocate only half as much memory.

Finally, something unexpected will happen when we fully qualify the column in the select clause:

select /* insert some huge comment here */
  case when 
  (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
  sys.t.a <=1
  )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
  then 10
  else null 
  end
  from sys.t;
998506496 Bytes mmap 
944 mmap calls
1679989 qcplgte calls 

Even though the number of qcplgte calls remained the same, the allocated memory increased by 41 times after we fully qualified the column in the select.

Oracle 11.2.0.4

As opposed to Oracle 12c, referencing fully qualified column name hasn’t increased the memory consumption in Oracle 11.2. The same query allocates only 11 MB in Oracle 11.2:

12451840 Bytes mmap 
8 mmap calls
949 qcplgte calls 

Real World Example

Frankly, I don’t know why somebody would want to reference the columns name by specifying the schema name, but apparently there are some applications around that generate such queries. For instance, Josué Ribeiro constructed a reproducible test case which is consuming tens of gigabaytes of memory on a 12c database until it hits the ORA-04030 error. In contrast, the same SQL runs smoothly on the 11.2 release. Also here, removing the schema name has proven to be an effective workaround.

Summary

To sum up, qcplgte is an auxiliary parsing function whose purpose is to extract the portion of the SQL text which will be parsed next. By observing pointer values it can be concluded that the parsing process needs to revisit certain places over and over again. In doing so, it will have to allocate heap memory for large (in respect to the length of the SQL text) queries. The memory consumption dramatically increases when the selected columns are referenced fully qualified including schema name in a 12c database. Fortunately, it is enough to remove the schema name when referencing the columns in the select to bring memory allocations back to the normal levels.

Update August 20, 2020

Oracle fixed this problem in 18c:

12c18c19c
allocated (MB)

1979

35

24

# mmap calls

1380

26

22

# qcplgte calls335997816799891679989

I tested with the following patch levels: 12.2.0.1.180116, 18.5.0.0.190115 and 19.7.0.0.200414.

Thanks for sharing

Nenad Noveljic

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.