I have a bash script start.sh which looks like this:
for thing in foo bar; do
{
background_processor $thing
cleanup_on_exit $thing
} &
done
This does what I want: I run start.sh, it exits with code 0, and the two subshells run in the background. Each subshell runs background_processor
, and when that exits, it runs cleanup_on_exit
. This works even if I exit the terminal from which I originally ran start.sh (even if that was an ssh connection).
Then I tried this:
ssh user@host "start.sh"
This works, except that after start.sh
has exited, ssh apparently also waits for the subshells to exit. I don't really understand why. Once start.sh
exits, the subshells become children of pid 1, and they are not even assigned with a tty... so I can't understand how they are still associated with my ssh connection.
I later tried this:
ssh -t user@host "start.sh"
Now the processes have an assigned pseudo-tty. Now, I find that ssh does exit as soon as start.sh
exits, but it also kills the child processes.
I guessed that the child processes were being sent SIGHUP in the latter case, so I did this:
ssh -t user@host "nohup start.sh"
That actually works! So, I have a solution to my practical problem, but I would like to grasp the subtleties of the SIGHUP/tty stuff here.
In summary, my questions are:
start.sh
exits, even though they have parent pid 1?I think I can explain this now! I had to learn a bit about what sessions and process groups are, which I did by reading The TTY Demystified.
- Why does ssh (without -t) wait for the child processes even after start.sh exits, even though they have parent pid 1?
Because with no tty, ssh connects to stdin/stdout/stderr of the shell process via pipes (which are then inherited by the children), and the version of OpenSSH that I am using (OpenSSH_4.3p2) waits for those sockets to close before exiting. Some earlier versions of OpenSSH did not behave that way. There is a good explanation of this, with rationale, here.
Conversely, when using an interactive login (or ssh -t
), ssh and the processes are using a TTY and so there are no pipes to wait for.
I can recover the behaviour I want by redirecting the streams. This variant returns immediately: ssh user@host "start.sh < /dev/null > /dev/null 2>&1"
- Why does ssh (with -t) kill the child processes, apparently with a SIGHUP, even though that does not happen when I run them from a terminal and log out of that terminal?
Because bash is starting in non-interactive mode, which means that job control is disabled by default, and consequently the child processes are in the same process group as the parent bash process (which is the session leader). When the parent bash process exits, the kernel sends SIGHUP to its process group (which is in the foreground) as described in setpgid(2)
:
If a session has a controlling terminal, ... [and] the session leader exits, the SIGHUP signal will be sent to each process in the foreground process group of the controlling terminal.
Conversely, when using an interactive login, bash is in interactive mode which means that job control is enabled by default, and so the child processes go into a separate process group and never receive the SIGHUP when I exit.
I can recover the behaviour I want by using set -m
to enable job control in bash. If I add set -m
to start.sh
, the children are no longer killed when ssh exits.
Mysteries solved :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With