Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

bash child script exits along with parent script when parent invoked interactively / by terminal, but not when invoked non-interactively / by cron

Tags:

linux

bash

shell

This is parent.sh:

#!/bin/bash

trap 'exit' SIGHUP SIGINT SIGQUIT SIGTERM

if ! [ -t 0 ]; then # if running non-interactively
    sleep 5 & # allow a little time for child to generate some output
    set -bm # to be able to trap SIGCHLD
    trap 'kill -SIGINT $$' SIGCHLD # when sleep is done, interrupt self automatically - cannot issue interrupt by keystroke since running non-interactively
fi

sudo ~/child.sh

This is child.sh:

#!/bin/bash

test -f out.txt && rm out.txt

for second in {1..10}; do
    echo "$second" >> out.txt
    sleep 1
done

If run the parent script in a terminal like so...

~/parent.sh

...and after about 3 seconds, issue an interrupt by keystroke. When checking out.txt a few seconds later, it will look like...

1  
2  
3  

...thus indicating that parent and child ended upon (keystroke) interrupt. This is corroborated by checking ps -ef in real-time and seeing that the script processes are present before the interrupt and gone after the interrupt.

If parent script is invoked by cron like so...

* * * * * ~/parent.sh  

...the content of out.txt is always...

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  

...thus indicating that at least the child did not end upon (kill command) interrupt. This is corroborated by checking ps -ef in real-time and seeing that the script processes are present before the interrupt and only the parent process is gone after the interrupt, but the child process persists until it runs its course.

Attempts to solve...

  1. Shell options can only be a factor here, inasmuch as non-interactive invocations of parent run set -bm (which entails PGIDs of children differing from PGID of parent - relevant up ahead). Other than that, both scripts show only options hB enabled, whether running interactively or not.
  2. Looked thru man bash for clues but found nothing helpful.
  3. Tried a handful of web searches which included many results from stackoverflow, but while some were similar to this question, none were the same. The closest answers entailed...
    • using wait to get the child process id and invoking kill on it - results in "/parent.sh: line 30: kill: (17955) - Operation not permitted"
    • invoking a kill on the process group - results in "~/parent.sh: line 31: kill: (-15227) - Operation not permitted" (kill using the PGID of child, which differs from parent when non-interactive, due to job control enabling)
    • looping thru the current jobs and killing each

Is the problem with these solutions that the parent runs as a regular user, while the child runs as root via sudo (it will ultimately be a binary, not a suid script), so the parent cannot kill it? If that's what "Operation not permitted" means, why is the sudo invoked process killable when sending a keystroke interrupt via terminal?

The natural course is to avoid additional code, unless necessary - i.e. since the scripts behave correctly when run interactively, if feasible it's much preferred to simply apply the same behavior when running non-interactively / by cron.

The bottom line question is, what can be done to make an interrupt (or term) signal issued while running non-interactively, produce the same behavior as an interrupt signal issued when running interactively?

Thanks. Any help is greatly appreciated.

like image 993
S Kos Avatar asked Dec 20 '16 02:12

S Kos


1 Answers

  1. When you manually run the script from an interactive shell (usually running on a pty), it's the terminal driver who catches CTRL-C and convert it to SIGINT and send to all processes in the foreground process group (the script itself and the sudo command).
  2. When your script is running from cron you only send SIGINT to the shell script itself and the sudo command will continue running and bash will not kill its child when it exits for this kind of scenario.

To explicitly send a signal to a whole process group you can use the negative process group ID. For your case the pgid should be the PID of the shell script so try like this:

trap 'kill -SIGINT -$$' SIGCHLD

UPDATE:

It turns out my assumption about the value of pgid is wrong. Just did a test with this simple cron.sh:

#!/bin/bash
set -m
sleep 888 &
sudo sleep 999

and crontal -l looks like this:

30 * * * * /root/tmp/cron.sh

When the cron job is running the ps outputs like this:

 PPID    PID   PGID    SID   COMMAND
15486  15487  15487  15487   /bin/sh -c /root/tmp/cron.sh
15487  15488  15487  15487   /bin/bash /root/tmp/cron.sh
15488  15489  15489  15487   sleep 888
15488  15490  15490  15487   sudo sleep 999
15490  15494  15490  15487   sleep 999

So the sudo (and its child) is running in a separate pgrp and the pgid is not the pid of the cron.sh so my solution (kill -INT -$$) would not work.

Then I think we can solve the problem like this:

#!/bin/bash
set -m
sudo sleep 999 & # run sudo in backgroup
pid=$!           # save the pid which is also the pgid
sleep 5
sudo kill -INT -$pid  # kill the pgrp.
                      # Use sudo since we're killing root's processes
like image 55
pynexj Avatar answered Nov 10 '22 11:11

pynexj