While analyzing complex problems and researching internals in Oracle database, I’ve often had to observe process interaction in slow motion. This requires temporarily suspending a process until another process finishes a relevant chunk of work, and then resume the suspended process.
For example, if you suspend log writer for a while, user process doing commit will be hanging during this time. You can see that log file sync wait event time increased during this time.
@ashtop.sql event2 "event='log file sync'" &1min
Total Distinct
Seconds AAS %This EVENT2 FIRST_SEEN LAST_SEEN Execs Seen
--------- ------- ------- ------------------------------------------ ------------------- ------------------- ----------
76 1.3 100% | log file sync 2021-03-17 17:54:08 2021-03-17 17:54:46 1
Of course, this is something you should do in lab environments only.
There are three ways to suspend an Oracle process. First two, sending signals and debugger, are generic – you can use them with any process. The third method uses the sqlplus oradebug command. It’s main advantage is that it doesn’t require shell access to the database server. You must, though, connect as sysdba.
Sending signals
With the Unix kill command you can send STOP and CONT signals to a process:
kill -STOP PID
kill -CONT PID
On Solaris you can alternatively use prun and pstop commands:
pstop PID
prun PID
Debugger
gdb oracle
attach PID
The process will suspend after you attach to it with debugger:
0x00007fffa617741a in _portfs () from /lib/64/libc.so.1
(gdb)
In prompt you can then type the command to take the call stack or any other internal information of interest, for example:
(gdb) bt
#0 0x00007fffa617741a in _portfs () from /lib/64/libc.so.1
#1 0x00007fffa608485a in port_get () from /lib/64/libc.so.1
#2 0x00000000060e19bd in sskgpwwait ()
#3 0x00000000060e15a3 in skgpwwait ()
#4 0x00000000062e35c4 in ksliwat ()
#5 0x00000000062e27af in kslwaitctx ()
#6 0x0000000007210d27 in ksarcv ()
#7 0x000000000720fb29 in ksbabs ()
#8 0x000000000720dd21 in ksbrdp ()
#9 0x0000000006e14599 in opirip ()
#10 0x000000000602cab0 in opidrv ()
#11 0x000000000602c8c7 in sou2o ()
#12 0x000000000602c694 in opimai_real ()
#13 0x000000000602c1eb in ssthrdmain ()
#14 0x000000000602bf59 in main ()
Once you’re finished with collecting data, you can resume the execution:
continue
SQLPlus
After connecting as sysdba, first you specify the target process:
@bg2.sql lgwr
NAME DESCRIPTION SID OPID SPID PADDR SADDR
----- ---------------------------------------------------------------- ---------- ---------- ------------------------ ---------------- ----------------
LGWR Redo etc. 129 32 26695 00000000A73D2C28 00000000A7706660
oradebug setospid 26695
Oracle pid: 32, Unix process pid: 26695, image: oracle@server (LGWR)
Then you can use suspend and resume commands.
oradebug suspend
oredebug resume