killsnoop
The bcc script killsnoop traces signals.
A short demo:
/usr/share/bcc/tools/killsnoop -s SIGUSR2
kill -SIGUSR2 1219543
TIME PID COMM SIG TPID RESULT
14:51:23 601489 bash 12 1219543 0
The current implementation of killsnoop is based on tracing the kill() system call. kill() sends a signal to a process group or process. killsnoop prints the caller’s pid and comm, and also extracts the receiver information from the arguments.
tgkill()
However, in Linux, there’s another system call that can send signals: tgkill(). The system call tgkill() sends a signal to a specific thread. Since killsnoop doesn’t trace tgkill(), it won’t show the signals sent by it.
Unlike kill(), tgkill() doesn’t have a libc wrapper. There isn’t a shell command that can send signals to a specific thread, either.
The following C program makes a system call to tgkill():
#include
#include
#include
#include
int main( int argc, char *argv[] ){
long ret;
int tgid, tid;
tgid = atoi(argv[1]);
tid = atoi(argv[2]);
ret = syscall(SYS_tgkill, tgid, tid, SIGUSR2);
return ret ;
}
The signal sent with tgkill() is captured by strace, but doesn’t appear in the killsnoop output:
my_tgkill 1219543 1219543
strace -e trace=none -e signal=SIGUSR2 -p 1219543
--- SIGUSR2 {si_signo=SIGUSR2, si_code=SI_TKILL, si_pid=244299, si_uid=1000} ---
I became aware of the killsnoop limitation while troubleshooting an Oracle issue where heavy signalling was taking place. The Oracle background processes were issuing tgkill() system calls for sending signals that weren’t captured by killsnoop.
The bpftrace script below traces signals that were sent with tgkill():
sudo bpftrace -e 'BEGIN
{
printf ("%-16s ", "TIME");
printf ("%-16.16s %-6s %-8s %-10s %-12s %4s\n", "COMM", "PID", "TGID", "TPID", "SIGNAL", "RETURN");
}
tracepoint:syscalls:sys_enter_tgkill
{
@args_tgid[tid] = args->tgid;
@args_pid[tid] = args->pid;
@args_sig[tid] = args->sig;
}
tracepoint:syscalls:sys_exit_tgkill
/ @args_tgid[tid] /
{
time("%D:%M:%S ");
printf("%-16.16s %-6d %-8d %-10d %-12d %-4d\n", comm, pid, @args_tgid[tid], @args_pid[tid], @args_sig[tid], args->ret);
delete(@args_tgid[tid]);
delete(@args_pid[tid]);
delete(@args_sig[tid]);
}'
TIME COMM PID TGID TPID SIGNAL RETURN
06/14/22:58:42 my_tgkill 304892 1219543 1219543 12 0
You can run it in parallel with killsnoop to get the full picture.
Summary
The current implementation of the killsnoop bcc script is based on tracing the kill() system call. Consequently, it misses the signals sent by the system call tgkill(). Oracle background processes call tgkill() for sending signals. You can run the bpftrace script provided above concurrently with killsnoop to capture all signals.