I would like to cancel on demand a running command, for this, I am trying, exec.CommandContext
, currently trying this:
https://play.golang.org/p/0JTD9HKvyad
package main
import (
"context"
"log"
"os/exec"
"time"
)
func Run(quit chan struct{}) {
ctx, cancel := context.WithCancel(context.Background())
cmd := exec.CommandContext(ctx, "sleep", "300")
err := cmd.Start()
if err != nil {
log.Fatal(err)
}
go func() {
log.Println("waiting cmd to exit")
err := cmd.Wait()
if err != nil {
log.Println(err)
}
}()
go func() {
select {
case <-quit:
log.Println("calling ctx cancel")
cancel()
}
}()
}
func main() {
ch := make(chan struct{})
Run(ch)
select {
case <-time.After(3 * time.Second):
log.Println("closing via ctx")
ch <- struct{}{}
}
}
The problem that I am facing is that the cancel()
is called but the process is not being killed, my guess is that the main thread exit first and don't wait for the cancel()
to properly terminate the command, mainly because If I use a time.Sleep(time.Second)
at the end of the main
function it exits/kills the running command.
Any idea about how could I wait
to ensure that the command has been killed before exiting not using a sleep
? could the cancel()
be used in a channel after successfully has killed the command?
In a try to use a single goroutine I tried with this: https://play.golang.org/p/r7IuEtSM-gL but the cmd.Wait()
seems to be blocking all the time the select
and was not available to call the cancel()
In Go, the program will stop if the end of the main
method (in the main
package) is reached. This behavior is described in the Go language specification under a section on program execution (emphasis my own):
Program execution begins by initializing the
main
package and then invoking the functionmain
. When that function invocation returns, the program exits. It does not wait for other (non-main) goroutines to complete.
I will consider each of your examples and their associated control flow defects. You will find links to the Go playground below, but the code in these examples will not execute in the restrictive playground sandbox as the sleep
executable cannot be found. Copy and paste to your own environment for testing.
case <-time.After(3 * time.Second):
log.Println("closing via ctx")
ch <- struct{}{}
After the timer fires and you signal to the goroutine it is time to kill the child and stop work, there is nothing to cause the main
method to block and wait for this to complete, so it returns. In accordance with the language spec, the program exits.
The scheduler may fire after the channel transmit, so there may be a race may between main
exiting and the other goroutines waking up to receive from ch
. However, it is unsafe to assume any particular interleaving of behavior – and, for practical purposes, unlikely that any useful work will happen before main
quits. The sleep
child process will be orphaned; on Unix systems, the operating system will normally re-parent the process onto the init
process.
Here, you have the opposite problem: main
does not return, so the child process is not killed. This situation is only resolved when the child process exits (after 5 minutes). This occurs because:
cmd.Wait
in the Run
method is a blocking call (docs). The select
statement is blocked waiting for cmd.Wait
to return an error value, so cannot receive from the quit
channel.The quit
channel (declared as ch
in main
) is an unbuffered channel. Send operations on unbuffered channels will block until a receiver is ready to receive the data. From the language spec on channels (again, emphasis my own):
The capacity, in number of elements, sets the size of the buffer in the channel. If the capacity is zero or absent, the channel is unbuffered and communication succeeds only when both a sender and receiver are ready.
As Run
is blocked in cmd.Wait
, there is no ready receiver to receive the value transmitted on the channel by the ch <- struct{}{}
statement in the main
method. main
blocks waiting to transmit this data, which prevents the process returning.
We can demonstrate both issues with minor code tweaks.
cmd.Wait
is blockingTo expose the blocking nature of cmd.Wait
, declare the following function and use it in place of the Wait
call. This function is a wrapper with the same behavior as cmd.Wait
, but additional side-effects to print what is happening to STDOUT. (Playground link):
func waitOn(cmd *exec.Cmd) error {
fmt.Printf("Waiting on command %p\n", cmd)
err := cmd.Wait()
fmt.Printf("Returning from waitOn %p\n", cmd)
return err
}
// Change the select statement call to cmd.Wait to use the wrapper
case e <- waitOn(cmd):
Upon running this modified program, you will observe the output Waiting on command <pointer>
to the console. After the timers fire, you will observe the output calling ctx cancel
, but no corresponding Returning from waitOn <pointer>
text. This will only occur when the child process returns, which you can observe quickly by reducing the sleep duration to a smaller number of seconds (I chose 5 seconds).
ch
, blocksmain
cannot return because the signal channel used to propagate the quit request is unbuffered and there is no corresponding listener. By changing the line:
ch := make(chan struct{})
to
ch := make(chan struct{}, 1)
the send on the channel in main
will proceed (to the channel's buffer) and main
will quit – the same behavior as the multiple goroutine example. However, this implementation is still broken: the value will not be read from the channel's buffer to actually start stopping the child process before main
returns, so the child process will still be orphaned.
I have produced a fixed version for you, code below. There are also some stylistic improvements to convert your example to more idiomatic go:
Indirection via a channel to signal when it is time to stop is unnecessary. Instead, we can avoid declaring a channel by hoisting declaration of the context and cancellation function to the main
method. The context can be cancelled directly at the appropriate time.
I have retained the separate Run
function to demonstrate passing the context in this way, but in many cases, its logic could be embedded into the main
method, with a goroutine spawned to perform the cmd.Wait
blocking call.
select
statement in the main
method is unnecessary as it only has one case
statement.sync.WaitGroup
is introduced to explicitly solve the problem of main
exiting before the child process (waited on in a separate goroutine) has been killed. The wait group implements a counter; the call to Wait
blocks until all goroutines have finished working and called Done
.package main
import (
"context"
"log"
"os/exec"
"sync"
"time"
)
func Run(ctx context.Context) {
cmd := exec.CommandContext(ctx, "sleep", "300")
err := cmd.Start()
if err != nil {
// Run could also return this error and push the program
// termination decision to the `main` method.
log.Fatal(err)
}
err = cmd.Wait()
if err != nil {
log.Println("waiting on cmd:", err)
}
}
func main() {
var wg sync.WaitGroup
ctx, cancel := context.WithCancel(context.Background())
// Increment the WaitGroup synchronously in the main method, to avoid
// racing with the goroutine starting.
wg.Add(1)
go func() {
Run(ctx)
// Signal the goroutine has completed
wg.Done()
}()
<-time.After(3 * time.Second)
log.Println("closing via ctx")
cancel()
// Wait for the child goroutine to finish, which will only occur when
// the child process has stopped and the call to cmd.Wait has returned.
// This prevents main() exiting prematurely.
wg.Wait()
}
(Playground link)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With