Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does /usr/bin/timeout kill the entire pipe?

Tags:

linux

bash

shell

I have a program that indefinitely generates output. I want to sample that output for a second and pipe into gzip. I'm using timeout util to limit the execution but the problem is that gzip gets killed as well.

E.g.:

$ /usr/bin/timeout 1 bash -c "echo asdf; sleep 5" | gzip > /tmp/foo.gz; ls -lah /tmp/foo.gz 
Terminated
-rw-rw-r-- 1 haizaar haizaar 0 Jul 22 15:05 /tmp/foo.gz

You see, the gzip command is Terminated and hence its output results in an empty file (due to the lost buffers)

I don't understand how timeout manages to kill a process that reads its stdout; and how to fix it. Even wrapping the whole thing in another bash results the same:

$ bash -c '/usr/bin/timeout 1 bash -c "echo asdf; sleep 5"' | gzip > /tmp/foo.gz; ls -lah /tmp/foo.gz
Terminated
-rw-rw-r-- 1 haizaar haizaar 0 Jul 22 15:30 /tmp/foo.gz

I can prepend timeout with setsid and then it works which makes me think it's somehow related to process groups being mixed up but it's hard to accept the fact that the current situation is "by design" because it makes timeout command very tricky to use with shell pipes.

Environemnt:

$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"

$ bash --version
GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

$ timeout --version
timeout (GNU coreutils) 8.30
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Padraig Brady.

UPDATE KamilCuk was spot on with his strace analysis. It also explains while wrapping timeout in another bash doesn't help either - it seems like bash has an optimization where if it has only command to run it doesn't forks but rather execs replacing itself. But if you add another command into the wrapping bash then it will fork thus creating a new process group and hence limiting the blast radius of the timeout command. I.e.

bash -c 'true; /usr/bin/timeout 1 bash -c "echo asdf; sleep 5"' | gzip > /tmp/foo.gz

(note the leading true)

I still think using timeout in pipelines is a black magic but that's another story.

like image 402
Zaar Hai Avatar asked Jul 22 '21 05:07

Zaar Hai


1 Answers

$ strace -ff -e trace=setpgid,kill,exit_group,exit,execve,wait4 bash --norc --noprofile -ic "timeout -v 1 bash --norc --noprofile -c 'echo asdf ; sleep 5' | { sleep 2; echo 123; }" 
execve("/usr/bin/bash", ["bash", "--norc", "--noprofile", "-ic", "timeout -v 1 bash --norc --nopro"...], 0x7ffeb8ef7ef8 /* 76 vars */) = 0
setpgid(0, 28995)                       = 0
strace: Process 28996 attached
[pid 28995] setpgid(28996, 28996)       = 0
[pid 28996] setpgid(28996, 28996)       = 0
strace: Process 28997 attached
[pid 28995] setpgid(28997, 28996)       = 0
[pid 28995] wait4(-1,  <unfinished ...>
[pid 28997] setpgid(28997, 28996)       = 0
[pid 28996] execve("/usr/bin/timeout", ["timeout", "-v", "1", "bash", "--norc", "--noprofile", "-c", "echo asdf ; sleep 5"], 0x560da0ff57e0 /* 76 vars */strace: Process 28998 attached
) = 0
[pid 28997] wait4(-1,  <unfinished ...>
[pid 28998] execve("/usr/bin/sleep", ["sleep", "2"], 0x560da0ff57e0 /* 76 vars */) = 0
[pid 28996] setpgid(0, 0)               = 0
strace: Process 28999 attached
[pid 28996] wait4(28999, 0x7ffd7eb5e96c, WNOHANG, NULL) = 0
[pid 28999] execve("/usr/local/bin/bash", ["bash", "--norc", "--noprofile", "-c", "echo asdf ; sleep 5"], 0x7ffd7eb5ec10 /* 76 vars */) = -1 ENOENT (No such file or directory)
[pid 28999] execve("/usr/bin/bash", ["bash", "--norc", "--noprofile", "-c", "echo asdf ; sleep 5"], 0x7ffd7eb5ec10 /* 76 vars */) = 0
[pid 28999] execve("/usr/bin/sleep", ["sleep", "5"], 0x55a84be27270 /* 76 vars */) = 0
[pid 28996] --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_int=0, si_ptr=NULL} ---
timeout: sending signal TERM to command ‘bash’
[pid 28996] kill(28999, SIGTERM)        = 0
[pid 28999] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] kill(0, SIGTERM <unfinished ...>
[pid 28997] <... wait4 resumed>0x7ffc114a9600, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid 28996] <... kill resumed>)         = 0
[pid 28999] +++ killed by SIGTERM +++
[pid 28998] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28997] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=28999, si_uid=1000, si_status=SIGTERM, si_utime=0, si_stime=0} ---
[pid 28997] +++ killed by SIGTERM +++
[pid 28995] <... wait4 resumed>[{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], WSTOPPED|WCONTINUED, NULL) = 28997
[pid 28998] +++ killed by SIGTERM +++
[pid 28995] wait4(-1,  <unfinished ...>
[pid 28996] kill(28999, SIGCONT)        = 0
[pid 28996] kill(0, SIGCONT)            = 0
[pid 28996] --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] wait4(28999, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], WNOHANG, NULL) = 28999
[pid 28996] exit_group(124)             = ?
[pid 28996] +++ exited with 124 +++
<... wait4 resumed>[{WIFEXITED(s) && WEXITSTATUS(s) == 124}], WSTOPPED|WCONTINUED, NULL) = 28996
Terminated
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=28997, si_uid=1000, si_status=SIGTERM, si_utime=0, si_stime=0} ---
wait4(-1, 0x7ffc114a9710, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)
setpgid(0, 28992)                       = 0
exit_group(143)                         = ?
+++ exited with 143 +++

So what happens is that timeout tries to be smart and kills the whole process group. From what I understand, it happens that:

  • bash creates a process group for the pipeline setpgid(28996, 28996)
  • timeout starts a process also in the same group setpgid(0, 0)
  • after the timeout timeout kills the whole process group kill(0, SIGTERM <unfinished ...>
  • because all pipeline processes are in the same process group, all are killed.

You can make bash start a new process group for the left side with a command grouping { ... }.

You could use timeout --foreground, however then timeout will only kill the foreground process. So while bash will die, gzip process will still wait on sleep 5 running in the background, because it will have open stdin to it.

Guessing (also from commit message) I think this could be the intention so that timeout can kill the whole pipeline just as-if it would be a magic shell built-in.

Also, the behavior differs between job control enabled and disabled, so differs between interactive and non-interactive shells.

like image 162
3 revs Avatar answered Oct 11 '22 15:10

3 revs