Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does running a background task over ssh fail if a pseudo-tty is allocated?

Tags:

I've recently run into some slightly odd behaviour when running commands over ssh. I would be interested to hear any explanations for the behaviour below.

Running ssh localhost 'touch foobar &' creates a file called foobar as expected:

[bob@server ~]$ ssh localhost 'touch foobar &' [bob@server ~]$ ls foobar foobar 

However running the same command but with the -t option to force pseudo-tty allocation fails to create foobar:

[bob@server ~]$ ssh -t localhost 'touch foobar &' Connection to localhost closed. [bob@server ~]$ echo $? 0 [bob@server ~]$ ls foobar ls: cannot access foobar: No such file or directory 

My current theory is that because the touch process is being backgrounded the pseudo-tty is allocated and unallocated before the process has a chance to run. Certainly adding one second sleep allows touch to run as expected:

[bob@pidora ~]$ ssh -t localhost 'touch foobar & sleep 1' Connection to localhost closed. [bob@pidora ~]$ ls foobar foobar 

If anyone has a definitive explanation I would be very interested to hear it. Thanks.

like image 608
Floating Octothorpe Avatar asked Sep 03 '15 19:09

Floating Octothorpe


People also ask

What is pseudo tty allocation?

A pseudo-TTY is a pair of character special files, a master file and a corresponding slave file. The master file is used by a networking application such as OMVS or rlogin. The corresponding slave file is used by the shell or the user's process to read and write terminal data.

How do you avoid pseudo-terminal will not be allocated because stdin is not a terminal?

To solve this problem, change the syntax of your ssh command invocation so that the remote command is comprised of a syntactically correct, multi-line string. Note that executing the above without /bin/bash will result in the warning Pseudo-terminal will not be allocated because stdin is not a terminal .

How do you prevent a background process from being stopped after closing ssh client in Linux?

The most widely used approach is to make use of the nohup command that Linux provides us with. It stands for “no hang up”, and its main purpose is to execute a command such that it ignores the hangup signal and therefore does not stop even when the ssh connection is down.


2 Answers

Oh, that's a good one.

This is related with how process groups work, how bash behaves when invoked as a non-interactive shell with -c, and the effect of & in input commands.

The answer assumes you're familiar with how job control works in UNIX; if you're not, here's a high level view: every process belongs to a process group (the processes in the same group are often put there as part of a command pipeline, e.g. cat file | sort | grep 'word' would place the processes running cat(1), sort(1) and grep(1) in the same process group). bash is a process like any other, and it also belongs to a process group. Process groups are part of a session (a session is composed of one or more process groups). In a session, there is at most one process group, called the foreground process group, and possibly many background process groups. The foreground process group has control of the terminal (if there is a controlling terminal attached to the session); the session leader (bash) moves processes from background to foreground and from foreground to background with tcsetpgrp(3). A signal sent to a process group is delivered to every process in that group.

If the concept of process groups and job control is completely new to you, I think you'll need to read up on that to fully understand this answer. A great resource to learn this is Chapter 9 of Advanced Programming in the UNIX Environment (3rd edition).

That being said, let's see what is happening here. We have to fit together every piece of the puzzle.

In both cases, the ssh remote side invokes bash(1) with -c. The -c flag causes bash(1) to run as a non-interactive shell. From the manpage:

An interactive shell is one started without non-option arguments and without the -c option whose standard input and error are both connected to terminals (as determined by isatty(3)), or one started with the -i option. PS1 is set and $- includes i if bash is interactive, allowing a shell script or a startup file to test this state.

Also, it is important to know that job control is disabled when bash is started in non-interactive mode. This means that bash will not create a separate process group to run the command, since job control is disabled, there will be no need to move this command between foreground and background, so it might as well just remain in the same process group as bash. This will happen whether or not you forced PTY allocation on ssh with -t.

However, the use of & has the side effect of causing the shell not to wait for command termination (even if job control is disabled). From the manpage:

If a command is terminated by the control operator &, the shell executes the command in the background in a subshell. The shell does not wait for the command to finish, and the return status is 0. Commands separated by a ; are executed sequentially; the shell waits for each command to terminate in turn. The return status is the exit status of the last command executed.

So, in both cases, bash will not wait for command execution, and touch(1) will be executed in the same process group as bash(1).

Now, consider what happens when a session leader exits. Quoting from setpgid(2) manpage:

If a session has a controlling terminal, and the CLOCAL flag for that terminal is not set, and a terminal hangup occurs, then the session leader is sent a SIGHUP. If the session leader exits, then a SIGHUP signal will also be sent to each process in the foreground process group of the controlling terminal.

(Emphasis mine)

When you don't use -t

When you don't use -t, there is no PTY allocation on the remote side, so bash is not a session leader, and in fact no new session is created. Because sshd is running as a daemon, the bash process that is forked + exec()'d will not have a controlling terminal. As such, even though the shell terminates very quickly (probably before touch(1)), there is no SIGHUP sent to the process group, because bash wasn't a session leader (and there is no controlling terminal). So everything works.

When you use -t

-t forces PTY allocation, which means that the ssh remote side will call setsid(2), allocate a pseudo-terminal + fork a new process with forkpty(3), connect the PTY master device input and output to the socket endpoints that lead to your machine, and finally execute bash(1). forkpty(3) opens the PTY slave side in the forked process that will become bash; since there's no controlling terminal for the current session, and a terminal device is being opened, the PTY device becomes the controlling terminal for the session and bash becomes the session leader.

Then the same thing happens again: touch(1) is executed in the same process group, etc., yadda yadda. The point is, this time, there is a session leader and a controlling terminal. So, since bash does not bother waiting because of the &, when it exits, SIGHUP is delivered to the process group and touch(1) dies prematurely.

About nohup

nohup(1) doesn't work here because there is still a race condition. If bash(1) terminates before nohup(1) has the chance to set up the necessary signal handling and file redirection, it will have no effect (which is probably what happens)

A possible fix

Forcefully re-enabling job control fixes it. In bash, you do that with set -m. This works:

ssh -t localhost 'set -m ; touch foobar &' 

Or force bash to wait for touch(1) to complete:

ssh -t localhost 'touch foobar & wait `pgrep touch`' 
like image 73
Filipe Gonçalves Avatar answered Sep 21 '22 18:09

Filipe Gonçalves


The answer of @Filipe Gonçalves is great, but it has something wrong. I have no enough reputation to comment there, so i correct/enrich content here:

When you don't use -t,

@Filipe says:

When you don't use -t, there is no PTY allocation on the remote side, so bash is not a session leader, and in fact no new session is created. ...

Actually, bash is a session leader and new session is created.

Let us test this:

# run sleep background process first, then call ps directly: [root@90fb1c3f30ce ~]# ssh localhost  'sleep 66 & ps -o pid,ppid,pgid,sess,tpgid,tty,args'     PID    PPID    PGID    SESS   TPGID TT       COMMAND  184074      67  184074  184074      -1 ?        sshd: root@notty  184076  184074  184076  184076      -1 ?        bash -c sleep 66 & ps -o pid,ppid,pgid,sess,tpgid,tty,args  184081  184076  184076  184076      -1 ?        sleep 66  184082  184076  184076  184076      -1 ?        ps -o pid,ppid,pgid,sess,tpgid,tty,args  Notice           ^^^^^   ^^^^^ 

We can see these bash/sleep/ps processes have the same PGID/SESS which equals to PID 184076 of bash process, but sshd parent prcoess has a different PGID/SESS. Here, the bash process is the leader of a new session and bash/sleep/ps processes belong to another process group.

In addition, we can find the ssh command does not return right away, it still waits about 66 seconds. You can find its reason here: Getting ssh to execute a command in the background on target machine

During the ssh command waiting, we can open another session and run:

[root@90fb1c3f30ce ~]# ps -eo pid,ppid,pgid,sess,tpgid,tty,args     PID    PPID    PGID    SESS   TPGID TT       COMMAND     # unrelated lines removed #  184074      67  184074  184074      -1 ?        sshd: root@notty  184081       1  184076  184076      -1 ?        sleep 66 Notice           ^^^^^   ^^^^^  [root@90fb1c3f30ce ~]# ps -e | grep 184076 [root@90fb1c3f30ce ~]# 

We can see the bash process (pid 184076) has already gone, but PGID/SESS of the sleep background process keeps no change. It does not matter, APUE session 9.4:

Each prcoess group can have a process group leader. The leader is identified by its process group ID being equal to its process ID.

It is possible for a process group leader to create a process group, create processes in the group, and then terminate. The process group still exists, as long as at least one process is in the group, regardless of whether the group leader terminates.

So, why doesn't this sleep process die?

When you don't use -t, there is no PTY allocation on the remote side, so prcoess group on the remote side is not a foreground process group (without a terminal, no meaning of foreground or background). As such, even though the shell terminates very quickly, there is no SIGHUP sent to its process group, because the process group is not a foreground process group. (SIGHUP signal will be sent to each process in the foreground process group of the controlling terminal).

like image 35
Tao Sfqh Avatar answered Sep 20 '22 18:09

Tao Sfqh