Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to call cancel() when using exec.CommandContext in a goroutine

Tags:

go

I would like to cancel on demand a running command, for this, I am trying, exec.CommandContext, currently trying this:

https://play.golang.org/p/0JTD9HKvyad

package main

import (
    "context"
    "log"
    "os/exec"
    "time"
)

func Run(quit chan struct{}) {
    ctx, cancel := context.WithCancel(context.Background())
    cmd := exec.CommandContext(ctx, "sleep", "300")
    err := cmd.Start()
    if err != nil {
        log.Fatal(err)
    }

    go func() {
        log.Println("waiting cmd to exit")
        err := cmd.Wait()
        if err != nil {
            log.Println(err)
        }
    }()

    go func() {
        select {
        case <-quit:
            log.Println("calling ctx cancel")
            cancel()
        }
    }()
}

func main() {
    ch := make(chan struct{})
    Run(ch)
    select {
    case <-time.After(3 * time.Second):
        log.Println("closing via ctx")
        ch <- struct{}{}
    }
}

The problem that I am facing is that the cancel() is called but the process is not being killed, my guess is that the main thread exit first and don't wait for the cancel() to properly terminate the command, mainly because If I use a time.Sleep(time.Second) at the end of the main function it exits/kills the running command.

Any idea about how could I wait to ensure that the command has been killed before exiting not using a sleep? could the cancel() be used in a channel after successfully has killed the command?

In a try to use a single goroutine I tried with this: https://play.golang.org/p/r7IuEtSM-gL but the cmd.Wait() seems to be blocking all the time the select and was not available to call the cancel()

like image 691
nbari Avatar asked Jan 28 '23 10:01

nbari


1 Answers

In Go, the program will stop if the end of the main method (in the main package) is reached. This behavior is described in the Go language specification under a section on program execution (emphasis my own):

Program execution begins by initializing the main package and then invoking the function main. When that function invocation returns, the program exits. It does not wait for other (non-main) goroutines to complete.


Defects

I will consider each of your examples and their associated control flow defects. You will find links to the Go playground below, but the code in these examples will not execute in the restrictive playground sandbox as the sleep executable cannot be found. Copy and paste to your own environment for testing.

Multiple goroutine example

case <-time.After(3 * time.Second):
        log.Println("closing via ctx")
        ch <- struct{}{}

After the timer fires and you signal to the goroutine it is time to kill the child and stop work, there is nothing to cause the main method to block and wait for this to complete, so it returns. In accordance with the language spec, the program exits.

The scheduler may fire after the channel transmit, so there may be a race may between main exiting and the other goroutines waking up to receive from ch. However, it is unsafe to assume any particular interleaving of behavior – and, for practical purposes, unlikely that any useful work will happen before main quits. The sleep child process will be orphaned; on Unix systems, the operating system will normally re-parent the process onto the init process.

Single goroutine example

Here, you have the opposite problem: main does not return, so the child process is not killed. This situation is only resolved when the child process exits (after 5 minutes). This occurs because:

  • The call to cmd.Wait in the Run method is a blocking call (docs). The select statement is blocked waiting for cmd.Wait to return an error value, so cannot receive from the quit channel.
  • The quit channel (declared as ch in main) is an unbuffered channel. Send operations on unbuffered channels will block until a receiver is ready to receive the data. From the language spec on channels (again, emphasis my own):

    The capacity, in number of elements, sets the size of the buffer in the channel. If the capacity is zero or absent, the channel is unbuffered and communication succeeds only when both a sender and receiver are ready.

    As Run is blocked in cmd.Wait, there is no ready receiver to receive the value transmitted on the channel by the ch <- struct{}{} statement in the main method. main blocks waiting to transmit this data, which prevents the process returning.

We can demonstrate both issues with minor code tweaks.

cmd.Wait is blocking

To expose the blocking nature of cmd.Wait, declare the following function and use it in place of the Wait call. This function is a wrapper with the same behavior as cmd.Wait, but additional side-effects to print what is happening to STDOUT. (Playground link):

func waitOn(cmd *exec.Cmd) error {
    fmt.Printf("Waiting on command %p\n", cmd)
    err := cmd.Wait()
    fmt.Printf("Returning from waitOn %p\n", cmd)
    return err
}

// Change the select statement call to cmd.Wait to use the wrapper
case e <- waitOn(cmd):

Upon running this modified program, you will observe the output Waiting on command <pointer> to the console. After the timers fire, you will observe the output calling ctx cancel, but no corresponding Returning from waitOn <pointer> text. This will only occur when the child process returns, which you can observe quickly by reducing the sleep duration to a smaller number of seconds (I chose 5 seconds).

Send on the quit channel, ch, blocks

main cannot return because the signal channel used to propagate the quit request is unbuffered and there is no corresponding listener. By changing the line:

    ch := make(chan struct{})

to

    ch := make(chan struct{}, 1)

the send on the channel in main will proceed (to the channel's buffer) and main will quit – the same behavior as the multiple goroutine example. However, this implementation is still broken: the value will not be read from the channel's buffer to actually start stopping the child process before main returns, so the child process will still be orphaned.


Fixed version

I have produced a fixed version for you, code below. There are also some stylistic improvements to convert your example to more idiomatic go:

  • Indirection via a channel to signal when it is time to stop is unnecessary. Instead, we can avoid declaring a channel by hoisting declaration of the context and cancellation function to the main method. The context can be cancelled directly at the appropriate time.

    I have retained the separate Run function to demonstrate passing the context in this way, but in many cases, its logic could be embedded into the main method, with a goroutine spawned to perform the cmd.Wait blocking call.

  • The select statement in the main method is unnecessary as it only has one case statement.
  • sync.WaitGroup is introduced to explicitly solve the problem of main exiting before the child process (waited on in a separate goroutine) has been killed. The wait group implements a counter; the call to Wait blocks until all goroutines have finished working and called Done.
package main

import (
    "context"
    "log"
    "os/exec"
    "sync"
    "time"
)

func Run(ctx context.Context) {
    cmd := exec.CommandContext(ctx, "sleep", "300")
    err := cmd.Start()
    if err != nil {
        // Run could also return this error and push the program
        // termination decision to the `main` method.
        log.Fatal(err)
    }

    err = cmd.Wait()
    if err != nil {
        log.Println("waiting on cmd:", err)
    }
}

func main() {
    var wg sync.WaitGroup
    ctx, cancel := context.WithCancel(context.Background())

    // Increment the WaitGroup synchronously in the main method, to avoid
    // racing with the goroutine starting.
    wg.Add(1)
    go func() {
        Run(ctx)
        // Signal the goroutine has completed
        wg.Done()
    }()

    <-time.After(3 * time.Second)
    log.Println("closing via ctx")
    cancel()

    // Wait for the child goroutine to finish, which will only occur when
    // the child process has stopped and the call to cmd.Wait has returned.
    // This prevents main() exiting prematurely.
    wg.Wait()
}

(Playground link)

like image 120
Cosmic Ossifrage Avatar answered May 22 '23 13:05

Cosmic Ossifrage