Problem:
I'm writing program in golang on linux that needs to execute long running process so that:
I'm running my program with root permissions.
Attempted solution:
func Run(pathToBin string, args []string, uid uint32, stdLogFile *os.File) (int, error) {
cmd := exec.Command(pathToBin, args...)
cmd.SysProcAttr = &syscall.SysProcAttr{
Credential: &syscall.Credential{
Uid: uid,
},
}
cmd.Stdout = stdLogFile
if err := cmd.Start(); err != nil {
return -1, err
}
go func() {
cmd.Wait() //Wait is necessary so cmd doesn't become a zombie
}()
return cmd.Process.Pid, nil
}
This solution seems to satisfy almost all of my requirements except that when I send SIGTERM/SIGKILL to my program the underlying process crashes. In fact I want my background process to be as separate as possible: it has different parent pid, group pid etc. from my program. I want to run it as daemon.
Other solutions on stackoverflow suggested to use cmd.Process.Release()
for similar use cases, but it doesn't seem to work.
Solutions which are not applicable in my case:
I can in fact use library that is easily importable using import
from github etc.
The provided context is used to kill the process (by calling os. Process. Kill) if the context becomes done before the command completes on its own. After the Run() completes, you can inspect ctx.
TLDR;
Just use https://github.com/hashicorp/go-reap
There is a great Russian expression which reads "don't try to give birth to a bicycle" and it means don't reinvent the wheel and keep it simple. I think it applies here. If I were you, I'd reconsider using one of:
This issue has already been solved ;)
Your question is imprecise or you are asking for non-standard features.
In fact I want my background process to be as separate as possible: it has different parent pid, group pid etc. from my program. I want to run it as daemon.
That is not how process inheritance works. You can not have process A start Process B and somehow change the parent of B to C. To the best of my knowledge this is not possible in Linux.
In other words, if process A (pid 55) starts process B (100), then B must have parent pid 55.
The only way to avoid that is have something else start the B process such as atd, crond, or something else - which is not what you are asking for.
If parent 55 dies, then PID 1 will be the parent of 100, not some arbitrary process.
Your statement "it has different parent pid" does not makes sense.
I want to run it as daemon.
That's excellent. However, in a GNU / Linux system, all daemons have a parent pid and those parents have a parent pid going all the way up to pid 1, strictly according to the parent -> child rule.
when I send SIGTERM/SIGKILL to my program the underlying process crashes.
I can not reproduce that behavior. See case8 and case7 from the proof-of-concept repo.
make case8
export NOSIGN=1; make build case7
unset NOSIGN; make build case7
$ make case8
{ sleep 6 && killall -s KILL zignal; } &
./bin/ctrl-c &
sleep 2; killall -s TERM ctrl-c
kill with:
{ pidof ctrl-c; pidof signal ; } | xargs -r -t kill -9
main() 2476074
bashed 2476083 (2476081)
bashed 2476084 (2476081)
bashed 2476085 (2476081)
zignal 2476088 (2476090)
go main() got 23 urgent I/O condition
go main() got 23 urgent I/O condition
zignal 2476098 (2476097)
go main() got 23 urgent I/O condition
zignal 2476108 (2476099)
main() wait...
p 2476088
p 2476098
p 2476108
p 2476088
go main() got 15 terminated
sleep 1; killall -s TERM ctrl-c
p 2476098
p 2476108
p 2476088
go main() got 15 terminated
sleep 1; killall -s TERM ctrl-c
p 2476098
p 2476108
p 2476088
Bash c 2476085 EXITs ELAPSED 4
go main() got 17 child exited
go main() got 23 urgent I/O condition
main() children done: 1 %!s(<nil>)
main() wait...
go main() got 15 terminated
go main() got 23 urgent I/O condition
sleep 1; killall -s KILL ctrl-c
p 2476098
p 2476108
p 2476088
balmora: ~/src/my/go/doodles/sub-process [main]
$ p 2476098
p 2476108
Bash _ 2476083 EXITs ELAPSED 6
Bash q 2476084 EXITs ELAPSED 8
The bash processes keep running after the parent is killed.
killall -s KILL ctrl-c;
All 3 "zignal" sub-processes are running until killed by
killall -s KILL zignal;
In both cases the sub-processes continue to run despite main process being signaled with TERM, HUP, INT. This behavior is different in a shell environment because of convenience reasons. See the related questions about signals. This particular answer illustrates a key difference for SIGINT. Note that SIGSTOP and SIGKILL cannot be caught by an application.
It was necessary to clarify the above before proceeding with the other parts of the question.
So far you have already solved the following:
The next one depends on whether the children are "attached" to a shell or not
The last one is hard to reproduce, but I have heard about this problem in the docker world, so the rest of this answer is focused on addressing this issue.
As you have noted, the Cmd.Wait()
is necessary to avoid creating zombies. After some experimentation I was able to consistency produce zombies in a docker environment using an intentionally simple replacement for /bin/sh
. This "shell" implemented in go will only run a single command and not much else in terms of reaping children. You can study the code over at github.
the simple wrapper which causes zombies
package main
func main() {
Sh()
}
The reaper wrapper
package main
import (
"fmt"
"sync"
"github.com/fatih/color"
"github.com/hashicorp/go-reap"
)
func main() {
if reap.IsSupported() {
done := make(chan struct{})
var reapLock sync.RWMutex
pids := make(reap.PidCh, 1)
errors := make(reap.ErrorCh, 1)
go reap.ReapChildren(pids, errors, done, &reapLock)
go report(pids, errors, done)
Sh()
close(done)
} else {
fmt.Println("Sorry, go-reap isn't supported on your platform.")
}
}
func report(pids reap.PidCh, errors reap.ErrorCh, done chan struct{}) {
sprintf := color.New(color.FgWhite, color.Bold).SprintfFunc()
for ;; {
select {
case pid := <-pids:
println(sprintf("raeper pid %d", pid))
case err := <-errors:
println(sprintf("raeper er %s", err))
case <-done:
return
}
}
}
The init / sh (pid 1) process which runs other commands
package main
import (
"os"
"os/exec"
"strings"
"time"
"github.com/google/shlex"
"github.com/tox2ik/go-poc-reaper/fn"
)
func Sh() {
args := os.Args[1:]
script := args[0:0]
if len(args) >= 1 {
if args[0] == "-c" {
script = args[1:]
}
}
if len(script) == 0 {
fn.CyanBold("cmd: expecting sh -c 'foobar'")
os.Exit(111)
}
var cmd *exec.Cmd
parts, _ := shlex.Split(strings.Join(script, " "))
if len(parts) >= 2 {
cmd = fn.Merge(exec.Command(parts[0], parts[1:]...), nil)
}
if len(parts) == 1 {
cmd = fn.Merge(exec.Command(parts[0]), nil)
}
if fn.IfEnv("HANG") {
fn.CyanBold("cmd: %v\n start", parts)
ex := cmd.Start()
if ex != nil {
fn.CyanBold("cmd %v err: %s", parts, ex)
}
go func() {
time.Sleep(time.Millisecond * 100)
errw := cmd.Wait()
if errw != nil {
fn.CyanBold("cmd %v err: %s", parts, errw)
} else {
fn.CyanBold("cmd %v all done.", parts)
}
}()
fn.CyanBold("cmd: %v\n dispatched, hanging forever (i.e. to keep docker running)", parts)
for {
time.Sleep(time.Millisecond * time.Duration(fn.EnvInt("HANG", 2888)))
fn.SystemCyan("/bin/ps", "-e", "-o", "stat,comm,user,etime,pid,ppid")
}
} else {
if fn.IfEnv("NOWAIT") {
ex := cmd.Start()
if ex != nil {
fn.CyanBold("cmd %v start err: %s", parts, ex)
}
} else {
ex := cmd.Run()
if ex != nil {
fn.CyanBold("cmd %v run err: %s", parts, ex)
}
}
fn.CyanBold("cmd %v\n dispatched, exit docker.", parts)
}
}
The Dockerfile
FROM scratch
# for sh.go
ENV HANG ""
# for sub-process.go
ENV ABORT ""
ENV CRASH ""
ENV KILL ""
# for ctrl-c.go, signal.go
ENV NOSIGN ""
COPY bin/sh /bin/sh ## <---- wrapped or simple /bin/sh or "init"
COPY bin/sub-process /bin/sub-process
COPY bin/zleep /bin/zleep
COPY bin/fork-if /bin/fork-if
COPY --from=busybox:latest /bin/find /bin/find
COPY --from=busybox:latest /bin/ls /bin/ls
COPY --from=busybox:latest /bin/ps /bin/ps
COPY --from=busybox:latest /bin/killall /bin/killall
Remaining code / setup can be seen here:
The gist of it is we start two sub-processes from go, using the "parent" sub-process
binary. The first child is zleep
and the second fork-if
. The second one starts a "daemon" that runs a forever-loop in addition to a few short-lived threads. After a while, we kill the sub-procss
parent, forcing sh
to take over the parenting for these children.
Since this simple implementation of sh does not know how to deal with abandoned children, the children become zombies. This is standard behavior. To avoid this, init systems are usually responsible for cleaning up any such children.
Check out this repo and run the cases:
$ make prep build
$ make prep build2
The first one will use the simple /bin/sh in the docker container, and the socond one will use the same code wrapped in a reaper.
With zombies:
$ make prep build case5
(…)
main() Daemon away! 16 (/bin/zleep)
main() Daemon away! 22 (/bin/fork-if)
(…)
main() CRASH imminent
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x49e45c]
goroutine 1 [running]:
main.main()
/home/jaroslav/src/my/go/doodles/sub-process/sub-process.go:137 +0xfc
cmd [/bin/sub-process /log/case5 3 /bin/zleep 111 2 -- /dev/stderr 3 /bin/fork-if --] err: exit status 2
Child '1' done
thread done
STAT COMMAND USER ELAPSED PID PPID
R sh 0 0:02 1 0
S zleep 3 0:02 16 1
Z fork-if 3 0:02 22 1
R fork-child-A 3 0:02 25 1
R fork-child-B 3 0:02 26 25
S fork-child-C 3 0:02 27 26
S fork-daemon 3 0:02 28 27
R ps 0 0:01 30 1
Child '2' done
thread done
daemon
(…)
STAT COMMAND USER ELAPSED PID PPID
R sh 0 0:04 1 0
Z zleep 3 0:04 16 1
Z fork-if 3 0:04 22 1
Z fork-child-A 3 0:04 25 1
R fork-child-B 3 0:04 26 1
S fork-child-C 3 0:04 27 26
S fork-daemon 3 0:04 28 27
R ps 0 0:01 33 1
(…)
With reaper:
$ make -C ~/src/my/go/doodles/sub-process case5
(…)
main() CRASH imminent
(…)
Child '1' done
thread done
raeper pid 24
STAT COMMAND USER ELAPSED PID PPID
S sh 0 0:02 1 0
S zleep 3 0:01 18 1
R fork-child-A 3 0:01 27 1
R fork-child-B 3 0:01 28 27
S fork-child-C 3 0:01 30 28
S fork-daemon 3 0:01 31 30
R ps 0 0:01 32 1
Child '2' done
thread done
raeper pid 27
daemon
STAT COMMAND USER ELAPSED PID PPID
S sh 0 0:03 1 0
S zleep 3 0:02 18 1
R fork-child-B 3 0:02 28 1
S fork-child-C 3 0:02 30 28
S fork-daemon 3 0:02 31 30
R ps 0 0:01 33 1
STAT COMMAND USER ELAPSED PID PPID
S sh 0 0:03 1 0
S zleep 3 0:02 18 1
R fork-child-B 3 0:02 28 1
S fork-child-C 3 0:02 30 28
S fork-daemon 3 0:02 31 30
R ps 0 0:01 34 1
raeper pid 18
daemon
STAT COMMAND USER ELAPSED PID PPID
S sh 0 0:04 1 0
R fork-child-B 3 0:03 28 1
S fork-child-C 3 0:03 30 28
S fork-daemon 3 0:03 31 30
R ps 0 0:01 35 1
(…)
Here is a picture of the same output, which may be less confusing to read.
Zombies
Reaper
Get the code
git clone https://github.com/tox2ik/go-poc-reaper.git
One terminal:
make tail-cases
Another terminal
make prep
make build
or make build2
make case0 case1
...
Related questions:
go
signals
Related discussions:
Relevant projects:
Relevant prose:
A zombie process is a process whose execution is completed but it still has an entry in the process table. Zombie processes usually occur for child processes, as the parent process still needs to read its child’s exit status. Once this is done using the wait system call, the zombie process is eliminated from the process table. This is known as reaping the zombie process.
from https://www.tutorialspoint.com/what-is-zombie-process-in-linux
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With