Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify whether a process was killed by a signal in bash

Consider these two C programs:

#include <signal.h>

int main(void) {
    raise(SIGTERM);
}
int main(void) {
    return 143;
}

If I run either one, the value of $? in bash will be 143. The wait syscall lets you distinguish them, though:

wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 11148
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 143}], 0, NULL) = 11214

And bash clearly uses this knowledge, since the first one results in Terminated being printed to the terminal (oddly, this happens even if I redirect both stdout and stderr elsewhere), and the second one doesn't. How can I differentiate these two cases from a bash script?

like image 842
Joseph Sible-Reinstate Monica Avatar asked Jan 02 '23 16:01

Joseph Sible-Reinstate Monica


2 Answers

I believe getting the full exit codes from pure bash/shell is not possible. The answers on Unix' StackExchange are very comprehensive.

What's common between all shells is that $? contains the lowest 8 bits of the exit code (the number passed to exit()) if the process terminated normally.

Where it differs is when the process is terminated by a signal. In all cases, and that's required by POSIX, the number will be greater than 128. POSIX doesn't specify what the value may be. In practice though, in all Bourne-like shells that I know, the lowest 7 bits of $? will contain the signal number. But, where n is the signal number,

  • in ash, zsh, pdksh, bash, the Bourne shell, $? is 128 + n. What that means is that in those shells, if you get a $? of 129, you don't know whether it's because the process exited with exit(129) or whether it was killed by the signal 1 (HUP on most systems). But the rationale is that shells, when they do exit themselves, by default return the exit status of the last exited command. By making sure $? is never greater than 255, that allows to have a consistent exit status:

    $ bash -c 'sh -c "kill \$\$"; printf "%x\n" "$?"'
    bash: line 1: 16720 Terminated              sh -c "kill \$\$"
    8f # 128 + 15
    $ bash -c 'sh -c "kill \$\$"; exit'; printf '%x\n' "$?"
    bash: line 1: 16726 Terminated              sh -c "kill \$\$"
    8f # here that 0x8f is from a exit(143) done by bash. Though it's
       # not from a killed process, that does tell us that probably
       # something was killed by a SIGTERM
    

For this reason, i believe, that you would need to run a command outside of bash to catch the exit code.


With some abstraction, a similar question has been asked regarding unbuffer which is a small script written in tcl. To be more precise, unbuffer uses the library libexpect with a tcl/tk wrapper. From the source of unbuffer I extracted the relevant code to derive a workaround:

#!/bin/bash

expectStat() {
expect <(cat << EOT
set stty_init "-opost"
set timeout -1
eval [list spawn -noecho ] $@
expect
send_user "[wait]\n"
EOT
)
}

expectStat sleep 5 & 
wait

which returns approximately the following line if sleep exits normally:

18383 exp4 0 0

If sleep is killed before it's exiting itself, the above script will approximately return:

18383 exp4 0 0 CHILDKILLED SIGTERM {software termination signal}

If a script is terminated with exit 143, the script will approximately return:

18383 exp4 0 143

The meaning of these strings can be extracted from the manual for expect. The integrated function wait is returning the above return lines. The first two values are the pid, and expect's name for the process. The fourth is the exit status. If a singal occurs more information is printed. The sixth value is the signal send to the process on its termination.

wait

normally returns a list of four integers. The first integer is the pid of the process that was waited upon. The second integer is the corresponding spawn id. The third integer is -1 if an operating system error occurred, or 0 otherwise. If the third integer was 0, the fourth integer is the status returned by the spawned process. If the third integer was -1, the fourth integer is the value of errno set by the operating system. The global variable errorCode is also set.

Additional elements may appear at the end of the return value from wait. An optional fifth element identifies a class of information. Currently, the only possible value for this element is CHILDKILLED in which case the next two values are the C-style signal name and a short textual description.

This means the fourth value and if present the sixth value are the values you are looking for. Store the whole line and extract the signal and exit code, for example with the following code:

RET=$(expectStat script.sh 1>&1)

# Filter status
EXITVALUE="$(echo "$RET" | cut -d' ' -f4)"
SIGNAL=$(echo "$RET" | cut -d' ' -f6)

#echo "Exit value: $EXITVALUE, Signal: $SIGNAL" 

if [ -n "$SIGNAL" ]; then
        echo "Likely killed by signal"
else
        echo "$EXITVALUE"
fi

Conclusively, this workaround is very inelegant. Maybe, there is another tool which brings its own c-based tools to get the occurrence of a signal.

like image 127
blubase Avatar answered Jan 04 '23 06:01

blubase


wait is a syscall and also a bash builtin.

To differentiate the two cases from bash run the program in the background and use the builtin wait to report the outcome.

Following are examples of both a non-zero exit code and an uncaught signal. These examples use the exit and kill bash builtins in a child bash shell, instead of a child bash shell you would run your program.

$ bash -c 'kill -s SIGTERM $$' & wait
[1] 36068
[1]+  Terminated: 15          bash -c 'kill -s SIGTERM $$'
$ bash -c 'exit 143' & wait
[1] 36079
[1]+  Exit 143                bash -c 'exit 143'
$

As to why you see Terminated printed to the terminal even when you redirect stdout and stderr the reason is that is printed by bash, not by the program.

Update:

By explicitly using the wait builtin you can now redirect its stderr (with the exit status of the program) to a separate file.

The following examples show the three types of termination: normal exit 0, non-zero exit, and uncaught signal. The results reported by wait are stored in files tagged with the PID of the corresponding program.

$ bash -c 'exit 0' & wait 2> exit_status_pid_$!
[1] 40279
$ bash -c 'exit 143' & wait 2> exit_status_pid_$!
[1] 40291
$ bash -c 'kill -s SIGTERM $$' & wait 2> exit_status_pid_$!
[1] 40303
$  for f in exit_status_pid*; do echo $f: $(cat $f); done
exit_status_pid_40279: [1]+ Done bash -c 'exit 0'
exit_status_pid_40291: [1]+ Exit 143 bash -c 'exit 143'
exit_status_pid_40303: [1]+ Terminated: 15 bash -c 'kill -s SIGTERM $$'
$
like image 29
amdn Avatar answered Jan 04 '23 05:01

amdn