I have a program that indefinitely generates output. I want to sample that output for a second and pipe into gzip. I'm using timeout
util to limit the execution but the problem is that gzip
gets killed as well.
E.g.:
$ /usr/bin/timeout 1 bash -c "echo asdf; sleep 5" | gzip > /tmp/foo.gz; ls -lah /tmp/foo.gz
Terminated
-rw-rw-r-- 1 haizaar haizaar 0 Jul 22 15:05 /tmp/foo.gz
You see, the gzip command is Terminated
and hence its output results in an empty file (due to the lost buffers)
I don't understand how timeout
manages to kill a process that reads its stdout; and how to fix it.
Even wrapping the whole thing in another bash
results the same:
$ bash -c '/usr/bin/timeout 1 bash -c "echo asdf; sleep 5"' | gzip > /tmp/foo.gz; ls -lah /tmp/foo.gz
Terminated
-rw-rw-r-- 1 haizaar haizaar 0 Jul 22 15:30 /tmp/foo.gz
I can prepend timeout
with setsid
and then it works which makes me think it's somehow related to process groups being mixed up but it's hard to accept the fact that the current situation is "by design" because it makes timeout
command very tricky to use with shell pipes.
Environemnt:
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"
$ bash --version
GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ timeout --version
timeout (GNU coreutils) 8.30
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Padraig Brady.
UPDATE
KamilCuk was spot on with his strace analysis. It also explains while wrapping timeout
in another bash
doesn't help either - it seems like bash has an optimization where if it has only command to run it doesn't fork
s but rather exec
s replacing itself.
But if you add another command into the wrapping bash
then it will fork thus creating a new process group and hence limiting the blast radius of the timeout
command. I.e.
bash -c 'true; /usr/bin/timeout 1 bash -c "echo asdf; sleep 5"' | gzip > /tmp/foo.gz
(note the leading true
)
I still think using timeout
in pipelines is a black magic but that's another story.
$ strace -ff -e trace=setpgid,kill,exit_group,exit,execve,wait4 bash --norc --noprofile -ic "timeout -v 1 bash --norc --noprofile -c 'echo asdf ; sleep 5' | { sleep 2; echo 123; }"
execve("/usr/bin/bash", ["bash", "--norc", "--noprofile", "-ic", "timeout -v 1 bash --norc --nopro"...], 0x7ffeb8ef7ef8 /* 76 vars */) = 0
setpgid(0, 28995) = 0
strace: Process 28996 attached
[pid 28995] setpgid(28996, 28996) = 0
[pid 28996] setpgid(28996, 28996) = 0
strace: Process 28997 attached
[pid 28995] setpgid(28997, 28996) = 0
[pid 28995] wait4(-1, <unfinished ...>
[pid 28997] setpgid(28997, 28996) = 0
[pid 28996] execve("/usr/bin/timeout", ["timeout", "-v", "1", "bash", "--norc", "--noprofile", "-c", "echo asdf ; sleep 5"], 0x560da0ff57e0 /* 76 vars */strace: Process 28998 attached
) = 0
[pid 28997] wait4(-1, <unfinished ...>
[pid 28998] execve("/usr/bin/sleep", ["sleep", "2"], 0x560da0ff57e0 /* 76 vars */) = 0
[pid 28996] setpgid(0, 0) = 0
strace: Process 28999 attached
[pid 28996] wait4(28999, 0x7ffd7eb5e96c, WNOHANG, NULL) = 0
[pid 28999] execve("/usr/local/bin/bash", ["bash", "--norc", "--noprofile", "-c", "echo asdf ; sleep 5"], 0x7ffd7eb5ec10 /* 76 vars */) = -1 ENOENT (No such file or directory)
[pid 28999] execve("/usr/bin/bash", ["bash", "--norc", "--noprofile", "-c", "echo asdf ; sleep 5"], 0x7ffd7eb5ec10 /* 76 vars */) = 0
[pid 28999] execve("/usr/bin/sleep", ["sleep", "5"], 0x55a84be27270 /* 76 vars */) = 0
[pid 28996] --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_int=0, si_ptr=NULL} ---
timeout: sending signal TERM to command ‘bash’
[pid 28996] kill(28999, SIGTERM) = 0
[pid 28999] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] kill(0, SIGTERM <unfinished ...>
[pid 28997] <... wait4 resumed>0x7ffc114a9600, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid 28996] <... kill resumed>) = 0
[pid 28999] +++ killed by SIGTERM +++
[pid 28998] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28997] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=28999, si_uid=1000, si_status=SIGTERM, si_utime=0, si_stime=0} ---
[pid 28997] +++ killed by SIGTERM +++
[pid 28995] <... wait4 resumed>[{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], WSTOPPED|WCONTINUED, NULL) = 28997
[pid 28998] +++ killed by SIGTERM +++
[pid 28995] wait4(-1, <unfinished ...>
[pid 28996] kill(28999, SIGCONT) = 0
[pid 28996] kill(0, SIGCONT) = 0
[pid 28996] --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] wait4(28999, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], WNOHANG, NULL) = 28999
[pid 28996] exit_group(124) = ?
[pid 28996] +++ exited with 124 +++
<... wait4 resumed>[{WIFEXITED(s) && WEXITSTATUS(s) == 124}], WSTOPPED|WCONTINUED, NULL) = 28996
Terminated
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=28997, si_uid=1000, si_status=SIGTERM, si_utime=0, si_stime=0} ---
wait4(-1, 0x7ffc114a9710, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)
setpgid(0, 28992) = 0
exit_group(143) = ?
+++ exited with 143 +++
So what happens is that timeout
tries to be smart and kills the whole process group. From what I understand, it happens that:
setpgid(28996, 28996)
setpgid(0, 0)
timeout
kills the whole process group kill(0, SIGTERM <unfinished ...>
You can make bash start a new process group for the left side with a command grouping { ... }
.
You could use timeout --foreground
, however then timeout
will only kill the foreground process. So while bash
will die, gzip
process will still wait on sleep 5
running in the background, because it will have open stdin
to it.
Guessing (also from commit message) I think this could be the intention so that timeout
can kill the whole pipeline just as-if it would be a magic shell built-in.
Also, the behavior differs between job control enabled and disabled, so differs between interactive and non-interactive shells.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With