I've recently run into some slightly odd behaviour when running commands over ssh. I would be interested to hear any explanations for the behaviour below. Running <code>ssh localhost 'touch foobar &'</code> creates a file called <code>foobar</code> as expected: <pre class="prettyprint"><code>[bob@server ~]$ ssh localhost 'touch foobar &' [bob@server ~]$ ls foobar foobar </code></pre> However running the same command but with the <code>-t</code> option to force pseudo-tty allocation fails to create <code>foobar</code>: <pre class="prettyprint"><code>[bob@server ~]$ ssh -t localhost 'touch foobar &' Connection to localhost closed. [bob@server ~]$ echo $? 0 [bob@server ~]$ ls foobar ls: cannot access foobar: No such file or directory </code></pre> My current theory is that because the touch process is being backgrounded the pseudo-tty is allocated and unallocated before the process has a chance to run. Certainly adding one second sleep allows touch to run as expected: <pre class="prettyprint"><code>[bob@pidora ~]$ ssh -t localhost 'touch foobar & sleep 1' Connection to localhost closed. [bob@pidora ~]$ ls foobar foobar </code></pre> If anyone has a definitive explanation I would be very interested to hear it. Thanks.

The answer of @Filipe Gonçalves is great, but it has something wrong. I have no enough reputation to comment there, so i correct/enrich content here: <blockquote> When you don't use -t, @Filipe says: <blockquote> When you don't use -t, there is no PTY allocation on the remote side, so bash is not a session leader, and in fact no new session is created. ... </blockquote> Actually, bash is a session leader and new session is created. </blockquote> Let us test this: <pre class="prettyprint"><code># run sleep background process first, then call ps directly: [root@90fb1c3f30ce ~]# ssh localhost 'sleep 66 & ps -o pid,ppid,pgid,sess,tpgid,tty,args' PID PPID PGID SESS TPGID TT COMMAND 184074 67 184074 184074 -1 ? sshd: root@notty 184076 184074 184076 184076 -1 ? bash -c sleep 66 & ps -o pid,ppid,pgid,sess,tpgid,tty,args 184081 184076 184076 184076 -1 ? sleep 66 184082 184076 184076 184076 -1 ? ps -o pid,ppid,pgid,sess,tpgid,tty,args Notice ^^^^^ ^^^^^ </code></pre> We can see these bash/sleep/ps processes have the same PGID/SESS which equals to PID 184076 of bash process, but sshd parent prcoess has a different PGID/SESS. Here, the bash process is the leader of a new session and bash/sleep/ps processes belong to another process group. In addition, we can find the ssh command does not return right away, it still waits about 66 seconds. You can find its reason here: Getting ssh to execute a command in the background on target machine During the ssh command waiting, we can open another session and run: <pre class="prettyprint"><code>[root@90fb1c3f30ce ~]# ps -eo pid,ppid,pgid,sess,tpgid,tty,args PID PPID PGID SESS TPGID TT COMMAND # unrelated lines removed # 184074 67 184074 184074 -1 ? sshd: root@notty 184081 1 184076 184076 -1 ? sleep 66 Notice ^^^^^ ^^^^^ [root@90fb1c3f30ce ~]# ps -e | grep 184076 [root@90fb1c3f30ce ~]# </code></pre> We can see the bash process (pid 184076) has already gone, but PGID/SESS of the sleep background process keeps no change. It does not matter, APUE session 9.4: <blockquote> Each prcoess group can have a process group leader. The leader is identified by its process group ID being equal to its process ID. It is possible for a process group leader to create a process group, create processes in the group, and then terminate. The process group still exists, as long as at least one process is in the group, regardless of whether the group leader terminates. </blockquote> So, why doesn't this sleep process die? When you don't use -t, there is no PTY allocation on the remote side, so prcoess group on the remote side is not a foreground process group (without a terminal, no meaning of foreground or background). As such, even though the shell terminates very quickly, there is no SIGHUP sent to its process group, because the process group is not a foreground process group. (SIGHUP signal will be sent to each process in the foreground process group of the controlling terminal).

Why does running a background task over ssh fail if a pseudo-tty is allocated?

Tags:

I've recently run into some slightly odd behaviour when running commands over ssh. I would be interested to hear any explanations for the behaviour below.

Running ssh localhost 'touch foobar &' creates a file called foobar as expected:

[bob@server ~]$ ssh localhost 'touch foobar &' [bob@server ~]$ ls foobar foobar

However running the same command but with the -t option to force pseudo-tty allocation fails to create foobar:

[bob@server ~]$ ssh -t localhost 'touch foobar &' Connection to localhost closed. [bob@server ~]$ echo $? 0 [bob@server ~]$ ls foobar ls: cannot access foobar: No such file or directory

My current theory is that because the touch process is being backgrounded the pseudo-tty is allocated and unallocated before the process has a chance to run. Certainly adding one second sleep allows touch to run as expected:

[bob@pidora ~]$ ssh -t localhost 'touch foobar & sleep 1' Connection to localhost closed. [bob@pidora ~]$ ls foobar foobar

If anyone has a definitive explanation I would be very interested to hear it. Thanks.

608

asked Sep 03 '15 19:09

Floating Octothorpe

2 Answers

Oh, that's a good one.

This is related with how process groups work, how bash behaves when invoked as a non-interactive shell with -c, and the effect of & in input commands.

The answer assumes you're familiar with how job control works in UNIX; if you're not, here's a high level view: every process belongs to a process group (the processes in the same group are often put there as part of a command pipeline, e.g. cat file | sort | grep 'word' would place the processes running cat(1), sort(1) and grep(1) in the same process group). bash is a process like any other, and it also belongs to a process group. Process groups are part of a session (a session is composed of one or more process groups). In a session, there is at most one process group, called the foreground process group, and possibly many background process groups. The foreground process group has control of the terminal (if there is a controlling terminal attached to the session); the session leader (bash) moves processes from background to foreground and from foreground to background with tcsetpgrp(3). A signal sent to a process group is delivered to every process in that group.

If the concept of process groups and job control is completely new to you, I think you'll need to read up on that to fully understand this answer. A great resource to learn this is Chapter 9 of Advanced Programming in the UNIX Environment (3rd edition).

That being said, let's see what is happening here. We have to fit together every piece of the puzzle.

In both cases, the ssh remote side invokes bash(1) with -c. The -c flag causes bash(1) to run as a non-interactive shell. From the manpage:

An interactive shell is one started without non-option arguments and without the -c option whose standard input and error are both connected to terminals (as determined by isatty(3)), or one started with the -i option. PS1 is set and $- includes i if bash is interactive, allowing a shell script or a startup file to test this state.

Also, it is important to know that job control is disabled when bash is started in non-interactive mode. This means that bash will not create a separate process group to run the command, since job control is disabled, there will be no need to move this command between foreground and background, so it might as well just remain in the same process group as bash. This will happen whether or not you forced PTY allocation on ssh with -t.

However, the use of & has the side effect of causing the shell not to wait for command termination (even if job control is disabled). From the manpage:

If a command is terminated by the control operator &, the shell executes the command in the background in a subshell. The shell does not wait for the command to finish, and the return status is 0. Commands separated by a ; are executed sequentially; the shell waits for each command to terminate in turn. The return status is the exit status of the last command executed.

So, in both cases, bash will not wait for command execution, and touch(1) will be executed in the same process group as bash(1).

Now, consider what happens when a session leader exits. Quoting from setpgid(2) manpage:

If a session has a controlling terminal, and the CLOCAL flag for that terminal is not set, and a terminal hangup occurs, then the session leader is sent a SIGHUP. If the session leader exits, then a SIGHUP signal will also be sent to each process in the foreground process group of the controlling terminal.

(Emphasis mine)

When you don't use -t

When you don't use -t, there is no PTY allocation on the remote side, so bash is not a session leader, and in fact no new session is created. Because sshd is running as a daemon, the bash process that is forked + exec()'d will not have a controlling terminal. As such, even though the shell terminates very quickly (probably before touch(1)), there is no SIGHUP sent to the process group, because bash wasn't a session leader (and there is no controlling terminal). So everything works.

When you use -t

-t forces PTY allocation, which means that the ssh remote side will call setsid(2), allocate a pseudo-terminal + fork a new process with forkpty(3), connect the PTY master device input and output to the socket endpoints that lead to your machine, and finally execute bash(1). forkpty(3) opens the PTY slave side in the forked process that will become bash; since there's no controlling terminal for the current session, and a terminal device is being opened, the PTY device becomes the controlling terminal for the session and bash becomes the session leader.

Then the same thing happens again: touch(1) is executed in the same process group, etc., yadda yadda. The point is, this time, there is a session leader and a controlling terminal. So, since bash does not bother waiting because of the &, when it exits, SIGHUP is delivered to the process group and touch(1) dies prematurely.

About nohup

nohup(1) doesn't work here because there is still a race condition. If bash(1) terminates before nohup(1) has the chance to set up the necessary signal handling and file redirection, it will have no effect (which is probably what happens)

A possible fix

Forcefully re-enabling job control fixes it. In bash, you do that with set -m. This works:

ssh -t localhost 'set -m ; touch foobar &'

Or force bash to wait for touch(1) to complete:

ssh -t localhost 'touch foobar & wait `pgrep touch`'

answered Sep 21 '22 18:09

Filipe Gonçalves

The answer of @Filipe Gonçalves is great, but it has something wrong. I have no enough reputation to comment there, so i correct/enrich content here:

When you don't use -t,

@Filipe says:

When you don't use -t, there is no PTY allocation on the remote side, so bash is not a session leader, and in fact no new session is created. ...

Actually, bash is a session leader and new session is created.

Let us test this:

# run sleep background process first, then call ps directly: [root@90fb1c3f30ce ~]# ssh localhost  'sleep 66 & ps -o pid,ppid,pgid,sess,tpgid,tty,args'     PID    PPID    PGID    SESS   TPGID TT       COMMAND  184074      67  184074  184074      -1 ?        sshd: root@notty  184076  184074  184076  184076      -1 ?        bash -c sleep 66 & ps -o pid,ppid,pgid,sess,tpgid,tty,args  184081  184076  184076  184076      -1 ?        sleep 66  184082  184076  184076  184076      -1 ?        ps -o pid,ppid,pgid,sess,tpgid,tty,args  Notice           ^^^^^   ^^^^^

We can see these bash/sleep/ps processes have the same PGID/SESS which equals to PID 184076 of bash process, but sshd parent prcoess has a different PGID/SESS. Here, the bash process is the leader of a new session and bash/sleep/ps processes belong to another process group.

In addition, we can find the ssh command does not return right away, it still waits about 66 seconds. You can find its reason here: Getting ssh to execute a command in the background on target machine

During the ssh command waiting, we can open another session and run:

[root@90fb1c3f30ce ~]# ps -eo pid,ppid,pgid,sess,tpgid,tty,args     PID    PPID    PGID    SESS   TPGID TT       COMMAND     # unrelated lines removed #  184074      67  184074  184074      -1 ?        sshd: root@notty  184081       1  184076  184076      -1 ?        sleep 66 Notice           ^^^^^   ^^^^^  [root@90fb1c3f30ce ~]# ps -e | grep 184076 [root@90fb1c3f30ce ~]#

We can see the bash process (pid 184076) has already gone, but PGID/SESS of the sleep background process keeps no change. It does not matter, APUE session 9.4:

Each prcoess group can have a process group leader. The leader is identified by its process group ID being equal to its process ID.

It is possible for a process group leader to create a process group, create processes in the group, and then terminate. The process group still exists, as long as at least one process is in the group, regardless of whether the group leader terminates.

So, why doesn't this sleep process die?

When you don't use -t, there is no PTY allocation on the remote side, so prcoess group on the remote side is not a foreground process group (without a terminal, no meaning of foreground or background). As such, even though the shell terminates very quickly, there is no SIGHUP sent to its process group, because the process group is not a foreground process group. (SIGHUP signal will be sent to each process in the foreground process group of the controlling terminal).

answered Sep 20 '22 18:09

Tao Sfqh

Related questions
                            
                                How to implement the "didset of swift" in objective-c?
                            
                                Type "SwiftClass" cannot conform to protocol "ObjcProtocol" because it has requirements that cannot be satisfied
                            
                                How to test file upload with laravel and phpunit?
                            
                                Check if key exists in a dict in Jinja2 template on ansible
                            
                                Docker apps logging with Filebeat and Logstash
                            
                                Why is unsigned short (multiply) unsigned short converted to signed int? [duplicate]
                            
                                How to change Auto Layout constraints after they are set when using constraintEqualToAnchor()?
                            
                                The 'Visual Studio Explorers and Designer Package' package did not load correctly
                            
                                ASP.NET 5 + Angular 2 routing (template page not REloading)
                            
                                Vuejs event on change of element value?
                            
                                The data type text cannot be used as an operand to the UNION, INTERSECT or EXCEPT operators because it is not comparable
                            
                                React setState can only update a mounted or mounting component

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With