Goroutines are light-weight processes that are automatically time-sliced onto one or more operating system threads by the Go runtime. (This is a really cool feature of Go!)
Suppose I have a concurrent application like a webserver. There is plenty of stuff happening concurrently in my hypothetical program, without much non-concurrent (Amdahl's Law) ratio.
It seems that the default number of operating system threads in use is currently 1. Does this mean that only one CPU core gets used?
If I start my program with
runtime.GOMAXPROCS(runtime.NumCPU())
will that give reasonably efficient use of all the cores on my PC?
Is there any "parallel slackness" benefit from having even more OS threads in use, e.g. via some heuristic
runtime.GOMAXPROCS(runtime.NumCPU() * 2)
?
From the Go FAQ:
Why doesn't my multi-goroutine program use multiple CPUs?
You must set the GOMAXPROCS shell environment variable or use the similarly-named function of the runtime package to allow the run-time support to utilize more than one OS thread.
Programs that perform parallel computation should benefit from an increase in GOMAXPROCS. However, be aware that concurrency is not parallelism.
(UPDATE 8/28/2015: Go 1.5 is set to make the default value of GOMAXPROCS the same as the number of CPUs on your machine, so this shouldn't be a problem anymore)
And
Why does using GOMAXPROCS > 1 sometimes make my program slower?
It depends on the nature of your program. Problems that are intrinsically sequential cannot be sped up by adding more goroutines. Concurrency only becomes parallelism when the problem is intrinsically parallel.
In practical terms, programs that spend more time communicating on channels than doing computation will experience performance degradation when using multiple OS threads. This is because sending data between threads involves switching contexts, which has significant cost. For instance, the prime sieve example from the Go specification has no significant parallelism although it launches many goroutines; increasing GOMAXPROCS is more likely to slow it down than to speed it up.
Go's goroutine scheduler is not as good as it needs to be. In future, it should recognize such cases and optimize its use of OS threads. For now, GOMAXPROCS should be set on a per-application basis.
In short: it is very difficult to make Go use "efficient use of all your cores". Simply spawning a billion goroutines and increasing GOMAXPROCS is just as likely to degrade your performance as speed it up because it will be switching thread contexts all the time. If you have a large program that is parallelizable, then increasing GOMAXPROCS to the number of parallel components works fine. If you have a parallel problem embedded in a largely non-parallel program, it may speed up, or you may have to make creative use of functions like runtime.LockOSThread() to ensure the runtime distributes everything correctly (generally speaking Go just dumbly spreads currently non-blocking Goroutines haphazardly and evenly among all active threads).
Also, GOMAXPROCS is the number of CPU cores to use, if it's greater than NumCPU I'm fairly sure that it simply clamps to NumCPU. GOMAXPROCS isn't strictly equal to the number of threads. I'm not 100% sure of exactly when the runtime decides to spawn new threads, but one instance is when the number of blocking goroutines using runtime.LockOSThread() is greater than or equal to GOMAXPROCs -- it will spawn more threads than cores so it can keep the rest of the program running sanely.
Basically, it's quite simple to increase GOMAXPROCS and make go use all cores of your CPU. It's quite another thing at this point in Go's development to actually get it to smartly and efficiently use all cores of your CPU, requiring a lot of program design and finagling to get right.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With