Log Switch Checkpoint
During log switch checkpoints, dirty blocks are being written to disk. It is important to keep in mind, that a redo log cannot be reused until its checkpoint has finished. Therefore, multiple uncompleted checkpoints could prevent further log switches, which in turn would cause write operations to wait until the redo logs can be reused again.
Because the duration of checkpoints is so important, I’ve developed a couple of SQLs to analyze checkpoint patterns which might be helpful for identifying IO performance bottlenecks or properly sizing redo logs.
Alert Log
The timestamps of the beginning and the completion of individual checkpoints are being recorded in the alert log if the parameter log_checkpoints_to_alert is set to TRUE. This parameter is set to FALSE by default, so you’d need to enable the checkpoint information gathering first.
The content of the alert log can be queried with the view V$DIAG_ALERT_EXT which accesses the alert log file as an external table. As doing more complex queries on an external table wouldn’t perform well, I’m going to load data into a table first:
create table checkpoint_events (
db varchar2(9 byte) not null ,
originating_timestamp timestamp(9) not null,
event varchar2(9) not null, scn integer not null
) ;
insert into checkpoint_events
SELECT
regexp_replace(adr_home,'.*/(.*)','\1'),
originating_timestamp,
regexp_substr(message_text,'Beginning|Completed'),
replace(
regexp_replace(message_text,'.*SCN: ([[:digit:]])','\1') , chr(10),''
)
from V$DIAG_ALERT_EXT
where
(
message_text like 'Beginning%checkpoint%' or
message_text like 'Completed%checkpoint%'
)
;
commit ;
In addition to loading data, the commands above do the following:
- extract checkpoint SCNs from the message text,
- discard non-relevant information,
- extract database name from adr_home.
V$DIAG_ALERT_EXT displays the content of the alert logs of all of the databases on a server. Hence, if you’re uncomfortable with running commands on the production server you can transfer the xml version of the alert log to a test server and do all the analysis there. Similarly, if you’d like to focus your analysis on a single database, just add where db=’DBNAME’ to the queries below.
The information of the beginning and completion of a checkpoint is stored in two separate rows which makes it a bit awkward for reporting. Therefore, the following view does a self-join on the table checkpoint_events to encapsulate checkpoint information in a single row:
create or replace view checkpoints as
select b.db,
sysdate
+ ( c.originating_timestamp - b.originating_timestamp ) * 86400
- sysdate duration ,
b.originating_timestamp started,
c.originating_timestamp completed,
b.scn
from checkpoint_events b, checkpoint_events c
where b.event = 'Beginning' and c.event = 'Completed'
and b.scn = c.scn and b.db = c.db ;
Also, the checkpoint duration in seconds has been added.
For the data formatting you can user following settings:
alter session set NLS_TIMESTAMP_FORMAT='dd-MON-yy hh24:mi:ss' ;
column db format a6
column duration format 999999
column avg_duration format 999999
column started format a19
column completed format a19
column scn format 99999999999999
Top 10 Checkpoints
Having the information properly prepared, displaying the longest running checkpoints becomes trivial:
select * from checkpoints order by duration desc fetch first 10 rows only ;
DB DURATION STARTED COMPLETED SCN
------ -------- ------------------- ------------------- --------------
DBXX7 1057 20-SEP-17 10:57:20 20-SEP-17 11:14:57 9236890926896
DBXX9 798 18-SEP-17 22:56:43 18-SEP-17 23:10:00 9239503025928
DBXX12 713 18-SEP-17 22:36:58 18-SEP-17 22:48:51 9239502835402
DBPX1 575 11-SEP-17 20:45:55 11-SEP-17 20:55:30 9085573686452
DBPX1 517 11-SEP-17 20:50:15 11-SEP-17 20:58:52 9085574034974
DBPX1 457 16-SEP-17 09:39:31 16-SEP-17 09:47:08 9085912947207
DBXX1 431 16-SEP-17 01:07:25 16-SEP-17 01:14:36 9231717131040
DBXX4 430 20-SEP-17 00:01:37 20-SEP-17 00:08:46 9236909077807
DBPX1 428 16-SEP-17 09:37:30 16-SEP-17 09:44:38 9085912733264
DBPX1 421 21-SEP-17 20:47:03 21-SEP-17 20:54:04 9086255245605
Concurrent Checkpoints
The next interesting information we can get from the view is the number of concurrent checkpoints over time. If this value is higher than the number of your redo log groups, the sessions are going to wait. Therefore, you should further analyze the intervals of high concurrency.
select distinct db,
case when
period_started between
lag(period_started,1) over (partition by db order by period_started)
and
lag(period_completed,1) over (partition by db order by period_started)
and cnt < lag(cnt,1) over (partition by db order by period_started)
then
lag(period_completed,1) over (partition by db order by period_started)
else
period_started
end as started,
case when
period_completed between
lead(period_started,1) over (partition by db order by period_started)
and
lead(period_completed,1) over (partition by db order by period_started)
and cnt < lead(cnt,1) over (partition by db order by period_started)
then
lead(period_started,1) over (partition by db order by period_started)
else
period_completed
end as completed,
cnt concurrency
from
(select v_concatenated.*,
last_value(concatenated_period_started) ignore nulls
over (
partition by db order by started
rows between unbounded preceding and current row
) as period_started,
first_value(concatenated_period_completed) ignore nulls
over (
partition by db order by started
rows between current row and unbounded following
) as period_completed
from
( select v_grouped.*,
case when started
between
lag(started,1) over (partition by db order by started) and
lag(completed,1) over (partition by db order by started)
and lag(cnt) over (partition by db order by started) = cnt
then NULL
else started
end as concatenated_period_started,
case when completed
between
lead(started,1) over (partition by db order by started) and
lead(completed,1) over (partition by db order by started)
and lead(cnt) over (partition by db order by started) = cnt
then NULL
else completed
end as concatenated_period_completed
from
( select c1.db,c1.started,c1.completed,count(*)+1 cnt
from checkpoints c1 inner join checkpoints c2 on c1.db = c2.db
and c2.started > c1.started and c2.started <= c1.completed
group by c1.db,c1.started,c1.completed
) v_grouped
) v_concatenated
)
order by
concurrency desc, started
fetch first 10 rows only
;
DB STARTED COMPLETED CONCURRENCY
------ ------------------- ------------------- -----------
DBXX7 23-MAR-17 15:36:39 23-MAR-17 15:38:24 4
DBXX8 05-JUN-17 20:28:14 05-JUN-17 20:34:27 4
DBXX2 22-JUL-17 18:31:05 22-JUL-17 18:37:48 4
DBXX2 11-AUG-17 10:51:20 11-AUG-17 11:01:35 4
DBXX7 22-AUG-17 11:04:22 22-AUG-17 11:09:26 4
DBXX7 28-AUG-17 08:57:02 28-AUG-17 09:02:08 4
DBXX1 04-SEP-17 08:54:08 04-SEP-17 09:01:03 4
DBXX7 05-SEP-17 10:02:11 05-SEP-17 10:08:48 4
DBXX7 13-SEP-17 09:09:02 13-SEP-17 09:16:06 4
DBXX12 14-SEP-17 16:47:07 14-SEP-17 16:51:12 4
Alternatively, you can change the order by clause to order by started to observe how the checkpoint concurrency changes with time:
DB STARTED COMPLETED CONCURRENCY
------ ------------------- ------------------- -----------
DBXX7 23-MAR-17 15:13:53 23-MAR-17 15:18:58 2
DBXX7 23-MAR-17 15:36:39 23-MAR-17 15:38:24 4
DBXX7 23-MAR-17 15:38:24 23-MAR-17 15:42:09 3
DBXX7 23-MAR-17 15:42:09 23-MAR-17 15:42:30 2
DBXX8 09-MAY-17 02:00:04 09-MAY-17 02:05:07 2
DBXX8 11-MAY-17 02:00:50 11-MAY-17 02:05:53 2
DBXX8 16-MAY-17 02:21:50 16-MAY-17 02:26:54 2
DBXX8 05-JUN-17 20:28:14 05-JUN-17 20:34:27 4
DBXX8 05-JUN-17 20:34:27 05-JUN-17 20:35:00 3
DBXX8 05-JUN-17 20:35:00 05-JUN-17 20:36:18 2
The most inner subquery named v_grouped filters out the time intervals of concurrent checkpoints by using the technique described in the Stack Overflow question Detect Overlapping date ranges from the same table. The outer subqueries use Oracle window analytic functions to concatenate and align adjacent time intervals.
It was not until I had seen Lukas Eder's presentation and read his blog post 10 SQL tricks that you-didnt think were possible that I started using windows analytic functions for non-trivial data processing.
Time Intervals
Lastly, I'll be comparing checkpoint activity during different time intervals, whereby I'd like to avoid hard coding the duration of the interval. Sadly, unlike SQL Server, Oracle doesn't provide parametrized views, so I'm going to emulate the functionality by using a pipelined function:
create or replace type checkpoint_intervals_ty is object (
db varchar2(9 byte), started timestamp, duration number,
chkpt_count integer
) ;
/
CREATE OR REPLACE type checkpoint_intervals_tbl_ty is
table of checkpoint_intervals_ty;
/
CREATE OR REPLACE FUNCTION checkpoint_intervals ( p_minutes integer)
RETURN checkpoint_intervals_tbl_ty PIPELINED AS
CURSOR cur (p_minutes integer)
IS
select db,
(
trunc(started,'hh24')
+ (trunc(to_char(started,'mi')/p_minutes)*p_minutes)/24/60
) interval_start,
sum(duration) total_duration,
count(*) count_chkpts
from checkpoints
group by db,
(
trunc(started,'hh24')
+ (trunc(to_char(started,'mi')/p_minutes)*p_minutes)/24/60
)
;
BEGIN
FOR rec IN cur (p_minutes)
LOOP
pipe row( checkpoint_intervals_ty (
rec.db,rec.interval_start,rec.total_duration,rec.count_chkpts
));
END LOOP;
RETURN;
END;
/
Finally, we can easily get the information about the most intensive 30-minute time intervals by selecting from the pipelined function:
select t.*,t.duration/t.chkpt_count avg_duration
from table(checkpoint_intervals(30)) t
order by duration desc
fetch first 5 rows only ;
DB STARTED DURATION CHKPT_COUNT AVG_DURATION
------ ------------------- -------- ----------- ------------
DBXX1 04-SEP-17 08:30:00 3597 12 300
DBXX2 22-JUL-17 18:30:00 3341 11 304
DBXX2 22-JUL-17 23:00:00 3051 10 305
DBPX1 16-SEP-17 09:30:00 2861 8 358
DBXX2 22-JUL-17 23:30:00 2785 9 309
Other useful sorting criteria might be the number of checkpoints and average duration of checkpoint, respectively, chkpt_count desc and duration/chkpt_count desc:
DB STARTED DURATION CHKPT_COUNT AVG_DURATION
------ ------------------- -------- ----------- ------------
DBXX1 20-SEP-17 08:00:00 1811 13 139
DBXX1 04-SEP-17 08:30:00 3597 12 300
DBXX9 20-SEP-17 18:30:00 1472 12 123
DBXX2 22-JUL-17 18:30:00 3341 11 304
DBXX2 22-JUL-17 23:00:00 3051 10 305
DB STARTED DURATION CHKPT_COUNT AVG_DURATION
------ ------------------- -------- ----------- ------------
DBXX7 20-SEP-17 10:30:00 1057 1 1057
DBXX9 18-SEP-17 22:30:00 798 1 798
DBXX12 18-SEP-17 22:30:00 713 1 713
DBXX4 20-SEP-17 00:00:00 430 1 430
DBPX1 11-SEP-17 20:30:00 1700 4 425
Here, I've used two techniques described on Ask TOM to implement data aggregation based on flexible time intervals by using a pipelined function:
BTW, if you think parametrized views should be implemented in Oracle, you can vote for this idea here.