I'm doing a test: compare excecution times of cgo and pure Go functions run 100 million times each. The cgo function takes longer time compared to the Golang function, and I am confused with this result. My testing code is:
package main
import (
"fmt"
"time"
)
/*
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void show() {
}
*/
// #cgo LDFLAGS: -lstdc++
import "C"
//import "fmt"
func show() {
}
func main() {
now := time.Now()
for i := 0; i < 100000000; i = i + 1 {
C.show()
}
end_time := time.Now()
var dur_time time.Duration = end_time.Sub(now)
var elapsed_min float64 = dur_time.Minutes()
var elapsed_sec float64 = dur_time.Seconds()
var elapsed_nano int64 = dur_time.Nanoseconds()
fmt.Printf("cgo show function elasped %f minutes or \nelapsed %f seconds or \nelapsed %d nanoseconds\n",
elapsed_min, elapsed_sec, elapsed_nano)
now = time.Now()
for i := 0; i < 100000000; i = i + 1 {
show()
}
end_time = time.Now()
dur_time = end_time.Sub(now)
elapsed_min = dur_time.Minutes()
elapsed_sec = dur_time.Seconds()
elapsed_nano = dur_time.Nanoseconds()
fmt.Printf("go show function elasped %f minutes or \nelapsed %f seconds or \nelapsed %d nanoseconds\n",
elapsed_min, elapsed_sec, elapsed_nano)
var input string
fmt.Scanln(&input)
}
and result is:
cgo show function elasped 0.368096 minutes or
elapsed 22.085756 seconds or
elapsed 22085755775 nanoseconds
go show function elasped 0.000654 minutes or
elapsed 0.039257 seconds or
elapsed 39257120 nanoseconds
The results show that invoking the C function is slower than the Go function. Is there something wrong with my testing code?
My system is : mac OS X 10.9.4 (13E28)
As you've discovered, there is fairly high overhead in calling C/C++ code via CGo. So in general, you are best off trying to minimise the number of CGo calls you make. For the above example, rather than calling a CGo function repeatedly in a loop it might make sense to move the loop down to C.
There are a number of aspects of how the Go runtime sets up its threads that can break the expectations of many pieces of C code:
libpthread
's thread local storage implementation.For these reasons, CGo picks the safe approach of running the C code in a separate thread set up with a traditional stack.
If you are coming from languages like Python where it isn't uncommon to rewrite code hotspots in C as a way to speed up a program you will be disappointed. But at the same time, there is a much smaller gap in performance between equivalent C and Go code.
In general I reserve CGo for interfacing with existing libraries, possibly with small C wrapper functions that can reduce the number of calls I need to make from Go.
Update for James's answer: it seems that there's no thread switch in current implementation.
See this thread on golang-nuts:
There's always going to be some overhead. It's more expensive than a simple function call but significantly less expensive than a context switch (agl is remembering an earlier implementation; we cut out the thread switch before the public release). Right now the expense is basically just having to do a full register set switch (no kernel involvement). I'd guess it's comparable to ten function calls.
See also this answer which links "cgo is not Go" blog post.
C doesn’t know anything about Go’s calling convention or growable stacks, so a call down to C code must record all the details of the goroutine stack, switch to the C stack, and run C code which has no knowledge of how it was invoked, or the larger Go runtime in charge of the program.
Thus, cgo has an overhead because it performs a stack switch, not thread switch.
It saves and restores all registers when C function is called, while it's not required when Go function or assembly function is called.
Besides that, cgo's calling conventions forbid passing Go pointers directly to C code, and common workaround is to use C.malloc
, and so introduce additional allocations. See this question for details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With