Are goroutines appropriate for large, parallel, compute-bound problems?

Tags:

Are go-routines pre-emptively multitasked for numerical problems?

I am very intrigued by the lean design of Go, the speed, but most by the fact that channels are first-class objects. I hope the last point may enable a whole new class of deep-analysis algorithms for big data, via the complex interconnection patterns which they should allow.

My problem domain requires real-time compute-bound analysis of streaming incoming data. The data can be partitioned into between 100-1000 "problems" each of which will take between 10 and 1000 seconds to compute (ie their granularity is highly variable). Results must however all be available before the output makes sense, ie, say 500 problems come in, and all 500 must be solved before I can use any of them. The application must be able to scale, potentially to thousands (but unlikely 100s of thousands) problems.

Given that I am less worried about numerical library support (most of this stuff is custom), Go seems ideal as I can map each problem to a goroutine. Before I invest in learning Go rather than say, Julia, Rust, or a functional language (none of which, as far as I can see, have first-class channels so for me are at an immediate disadvantage) I need to know if goroutines are properly pre-emptively multi-tasked. That is, if I run 500 compute-bound goroutines on a powerful multicore computer, can I expect reasonably load balancing across all the "problems" or will I have to cooperatively "yield" all the time, 1995-style. This issue is particularly important given the variable granularity of the problem and the fact that, during compute, I usually will not know how much longer it will take.

If another language would serve me better, I am happy to hear about it, but I have a requirement that threads (or go/coroutines) of execution be lightweight. Python multiprocessing module for example, is far too resource intensive for my scaling ambitions. Just to pre-empt: I do understand the difference between parallelism and concurrency.

209

asked Jun 08 '14 12:06

Thomas Browne

2 Answers

The Go runtime has a model where multiple Go routines are mapped onto multiple threads in an automatic fashion. No Go routine is bound to a certain thread, the scheduler may (and will) schedule Go routines to the next available thread. The number of threads a Go program uses is taken from the GOMAXPROCS environment variable and can be overriden with runtime.GOMAXPROCS(). This is a simplified description which is sufficient for understanding.

Go routines may yield in the following cases:

On any operation that might block, i.e. any operation that cannot return a result on the spot because it is either a (possible) blocking system-call like io.Read() or an operation that might require waiting for other Go routines, like acquiring a mutex or sending to or receiving from a channel
On various runtime operations
On function call if the scheduler detects that the preempted Go routine took a lot of CPU time (this is new in Go 1.2)
On call to runtime.Gosched()
On panic()
As of Go 1.14, tight loops can be preempted by the runtime. As a result, loops without function calls no longer potentially deadlock the scheduler or significantly delay garbage collection. This is not supported on all platforms - be sure to review the release notes. Also see issue #36365 for future plans in this area.
On various other occasions

The following things prevent a Go routine from yielding:

Executing C code. A Go routine can't yield while it's executing C code via cgo.
Calling runtime.LockOSThread(), until runtime.UnlockOSThread() has been called.

answered Oct 24 '22 20:10

fuz

Not sure I fully understand you, however you can set runtime.GOMAXPROCS to scale to all processes, then use channels (or locks) to synchronize the data, example:

const N = 100

func main() {
    runtime.GOMAXPROCS(runtime.NumCPU()) //scale to all processors
    var stuff [N]bool
    var wg sync.WaitGroup
    ch := make(chan int, runtime.NumCPU())
    done := make(chan struct{}, runtime.NumCPU())
    go func() {
        for i := range ch {
            stuff[i] = true
        }
    }()
    wg.Add(N)
    for i := range stuff {
        go func(i int) {

            for { //cpu bound loop
                select {
                case <-done:
                    fmt.Println(i, "is done")
                    ch <- i
                    wg.Done()
                    return
                default:
                }
            }
        }(i)
    }
    go func() {
        for _ = range stuff {
            time.Sleep(time.Microsecond)
            done <- struct{}{}
        }
        close(done)
    }()
    wg.Wait()
    close(ch)
    for i, v := range stuff { //false-postive datarace
        if !v {
            panic(fmt.Sprintf("%d != true", i))
        }
    }
    fmt.Println("All done")
}

EDIT: Information about the scheduler @ http://tip.golang.org/src/pkg/runtime/proc.c

Goroutine scheduler

The scheduler's job is to distribute ready-to-run goroutines over worker threads.

The main concepts are:

G - goroutine.

M - worker thread, or machine.

P - processor, a resource that is required to execute Go code. M must have an associated P to execute Go code, however it can be blocked or in a syscall w/o an associated P.

Design doc at http://golang.org/s/go11sched.

answered Oct 24 '22 19:10

OneOfOne

Related questions
                            
                                what is the difference between a stateful and a stateless lambda expression?
                            
                                Creating a minimal graph representing all combinations of 3-bit binary strings
                            
                                multi-core processing in R on windows XP - via doMC and foreach
                            
                                parallel downloads in PHP
                            
                                Why is OpenMP in a mex file only producing 1 thread?
                            
                                How can I initialize MPI in a function?
                            
                                How to accelerate matrix multiplications in Python?
                            
                                How to work with interactively-defined classes in IPython.parallel?
                            
                                How to map a function over a list in parallel in racket?
                            
                                Sort resource access schedule for multiple threads so the number of writing conflicts gets minimized
                            
                                Using Xeon Phi with JVM-based language
                            
                                Run a R function with multiple parameters in parallel mode
                            
                                How to Synchronize with Julia CUDArt?
                            
                                What are Builder, Combiner, and Splitter in scala?
                            
                                SQL server, pyodbc and deadlock errors
                            
                                R Shiny run task/script in different process
                            
                                Does Parallel::ForkManager() module support synchronization on global variables?
                            
                                Can I nest parallel:::parLapply()?
                            
                                Using parallelisation to scrape web pages with R
                            
                                How to edit and re-build the GCC libstdc++ C++ standard library source?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Are goroutines appropriate for large, parallel, compute-bound problems?

Tags:

parallel-processing

go

goroutine

Thomas Browne

People also ask

2 Answers

fuz

Goroutine scheduler

OneOfOne

Recent Activity

Donate For Us