Large Memory Allocations due to Query Parsing

I’ve recently been alerted because of an ORA-04036 (PGA memory allocation limit) error on a 12.1 Oracle database. As the problem originated in the test environment and hence was not classified as urgent, I decided to investigate the root-cause instead of simply increasing the pga_aggregate_limit. It turned out to be a good decision, because the analysis revealed an underlying parsing issue.

In this blog post, I’ll describe the troubleshooting methodology which is underpinned by ASH dictionary views (subject to licensing!), Jonathan Lewis’ My Session Workload tool, DTrace and Flame Graphs.

First, I selected the samples from v$active_session_history and aggregated the size of the allocated pga memory based on force_matching_signature.

select a.force_matching_signature,max(pga_allocated),count(*) 
 from v$active_session_history a 
 where
 sample_time between
 to_date('10.02.2017 15:34:30','dd.mm.yyyy hh24:mi:ss') and
 to_date('10.02.2017 15:34:32','dd.mm.yyyy hh24:mi:ss')
 group by a.force_matching_signature 
 order by max(pga_allocated) desc ;

FORCE_MATCHING_SIGNATURE    MAX(PGA_ALLOCATED)	COUNT(*)
7.92426005037451E18	    4544622592	       28

Second, I found out the SQL text of the largest PGA consumer by querying the shared pool based on force_matching_signature obtained in the first step:

select distinct to_char(sql_fulltext) from v$sql s , v$active_session_history a  
  where s.force_matching_signature = a.force_matching_signature and
    sample_time between
      to_date('10.02.2017 15:34:30','dd.mm.yyyy hh24:mi:ss') and
      to_date('10.02.2017 15:34:32','dd.mm.yyyy hh24:mi:ss')
;

select ...

Note: I replaced the query text with select… in the entire post, because the SQL is application-specific anyway and as such wouldn’t provide any useful information for explaining the troubleshooting methodology.

Third, I reproduced the problem by running the query in question:

set serveroutput on size 1000000
exec snap_my_stats.start_snap ;
select  ... ;
exec snap_my_stats.end_snap ;

Name                            Value
----                            -----
DB time                         517
session pga memory max          275,644,416
parse time elapsed              431

Note: the package snap_my_stats is a part of Jonathan Lewis’ My Session Workload tool.

By looking at the statistics above, we can draw following conclusions:

  • the test confirmed the high PGA usage observed in v$active_session_history
  • the parse time is significantly higher than the execution time
    ( execution time = DB time – parse time elapsed )

At that point, I wondered whether the PGA memory was used for parsing or for execution. This question was easily answered by collecting the statistics for explain for select… instead for the query execution:

set serveroutput on size 1000000
exec snap_my_stats.start_snap ;
explain plan for select ...
exec snap_my_stats.end_snap ;

Name                            Value
----                            -----
session pga memory max          182,386,688
parse time elapsed              452

As we can see, the largest part of the used pga was allocated during parsing. To obtain some additional insight, I collected Oracle kernel functions stacks during the query optimization by using following DTrace script:

profile:::profile-997Hz
/ pid == $target /
{
  @[ustack()] = count() ;
}

After spooling the stacks in the output file ustacks.out, I generated Flame Graphs:

stackcollapse.pl ustacks.out > ustacks.folded
flamegraph.pl ustacks.folded > ustacks.svg

Note: Since the flame graph is a heavy file which would massively slow down a browser, I didn’t embed it here. I’m including the download link instead. However, I recommend download the file and looking into it on a device with more processing power. Right click here to download the flame graph.

The function that stands out in the Flame Graph is kkojnp. The search on MOS Knowledge Base with “slow parsing kkojnp” as the key words returned the Bug 22225899 : SLOW PARSE FOR COMPLEX QUERY. The bug note suggests setting the hidden parameter _optimizer_squ_bottomup to FALSE as a workaround.

Indeed, the workaround reduced the parse time from 4.52 seconds to just 270 msec. Furthermore, the PGA usage dropped from 174 MB to 19 MB.

alter session set "_optimizer_squ_bottomup" = FALSE ;

set serveroutput on size 1000000
exec snap_my_stats.start_snap ;
explain plan for select...
exec snap_my_stats.end_snap ;

Name                            Value
----                            -----
session pga memory max          19,791,872
parse time elapsed              27

I knew so far, that disabling _optimizer_squ_bottomup helped in both reducing the parsing time and memory footprint, but did it introduce any inefficiencies into the execution plan? To answer this question, I compared consistent gets of both execution plans:

-- without the workaround
set serveroutput on size 1000000
exec snap_my_stats.start_snap ;
select ...
exec snap_my_stats.end_snap ;

Name                            Value
----                            -----
DB time                         543
consistent gets                 312,986


-- with the workaround
alter session set "_optimizer_squ_bottomup" = FALSE ;

set serveroutput on size 1000000
exec snap_my_stats.start_snap ;
select  ...
exec snap_my_stats.end_snap ;

Name                            Value
----                            -----
DB time                         23
consistent gets                 2,369

Not only did the workaround reduce the parsing time, but the execution plan that was produced more quickly was superior to the original execution plan. In this particular case, I observed the problem on a Informatica Powercenter repository database and decided to disable _optimizer_squ_bottomup in the spfile.

However, please don’t take the result of this case study as a general recommendation to disable optimizer_squ_bottomup on all your databases. Instead, consider it only as a possible solution if you hit severe parsing issues. Should you do so, make sure that you thoroughly test the application for any side effects.

Last but not least, if you liked this post, you might be also interested in another article about query parsing performance: http://nenadnoveljic.com/blog/excessive-query-parsing-time-with-long-in-lists-and-extended-statistics/ .

Thanks for sharing

Nenad Noveljic

11 Comments

  1. Hi, Nenad Noveljic.

    I have the same problem about QUERY PARSING (ORA-04036) in Oracle 12.1 and Oracle 12.2.
    The query is very large and complex, but work fine in Oracle 11 and Oracle 10.
    I changed the hidden parameter _optimizer_squ_bottomup at Oracle 12c, but this is not work for me.

    Questions
    Do you have another solution for query parsing?
    The Oracle have any pacth for Bug 22225899 : SLOW PARSE FOR COMPLEX QUERY?

    • Hi Josué,
      Are you 100% sure that ORA-04036 gets generated during parsing/optimizing and not during fetching?
      This can be verified by running explain plan for select …

        • Hi Josué,
          I managed to reproduce the memory leak in 12.2 by using your test case. The memory allocations are being made by the Oracle Kernel function qcpiStoreParseText. The function is neither instrumented nor there are any hits on Metalink for it. Since you managed to engineer the reproducible test case, opening a SR is a viable option. Let me know about the outcome.
          Good luck!
          Nenad

  2. Sorry, but it is incorrect to do sum(pga_allocated) like in your first query.
    By doing this, you’re summing together instant PGA allocation in different point of times for the same session. You should use max for that.

      • Sorry, I forgot my manners – first of all thanks to you for the great finding and having time to share it!
        It helped me a lot in similar circumstances, even though the bug on metalink is claimed to be only affecting 12.1 on hpux, the workaround also helped to reduce PGA footprint on parsing for 18c running on exadata.

        I was looking at the svg you shared with my both eyes, but couldn’t figure, how did you conclude the “kkojnp” function is the culrpit one? What’s the right approach to read such the flame graph?

        Thank you!

        • Y-axis represents captured stacks. Broader stacks (i.e. wider on the x-axis) were captured more times. We can discern around 10 broader stacks. When you position the mouse pointer over a square, the function name and the percentage appear. This percentage is the contribution of the function (including its children) in this particular stack to the total number of samples. For a couple of the broadest stacks I moved the mouse upwards until I found a function which I was able to map to an optimizer operation (JPPD in this case). By adding the percentages of JPPD functions on different stacks I came to an substantial amount. For example, kkojnp alone, which is, by the way, placed near the top of the stacks, is represented in around 30% stacks.
          My case wasn’t necessarily a bug – I used the bug note just to figure out why the optimizer calls kkojnp (today, I’d use Frits Hoogland’s orafun.info instead). There were just many JPPD calculations made due to a large number of predicates and subqueries. Also, subquery unnesting increased the number of considered JPPD transformations (you can see some stacks with *SU* functions below *JPPD* functions on the stack).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.