I'm a newbie to golang, trying to rewrite my java server project in golang.
I found, passing pointers into channel cause almost 30% performance drop compared to passing values.
Here is a sample snippet: package main import ( "time" "fmt" )
var c = make(chan t, 1024)
// var c = make(chan *t, 1024)
type t struct {
a uint
b uint
}
func main() {
start := time.Now()
for i := 0; i < 1000; i++ {
b := t{a:3, b:5}
// c <- &b
c <- b
}
elapsed := time.Since(start)
fmt.Println(t2)
}
As a value it can be stack allocated:
go run -gcflags '-m' tmp.go
# command-line-arguments
./tmp.go:18: inlining call to time.Time.Nanosecond
./tmp.go:24: inlining call to time.Time.Nanosecond
./tmp.go:25: t2 escapes to heap
./tmp.go:25: main ... argument does not escape
63613
As a pointer, it escapes to the heap:
go run -gcflags '-m' tmp.go
# command-line-arguments
./tmp.go:18: inlining call to time.Time.Nanosecond
./tmp.go:24: inlining call to time.Time.Nanosecond
./tmp.go:21: &b escapes to heap <-- Additional GC pressure
./tmp.go:20: moved to heap: b <--
./tmp.go:25: t2 escapes to heap
./tmp.go:25: main ... argument does not escape
122513
Escaping to the heap introduces some overhead / GC pressure.
Looking at the assembly, the pointer version also introduces additional instructions, including:
go run -gcflags '-S' tmp.go
0x0055 00085 (...tmp.go:18) CALL runtime.newobject(SB)
The non-pointer variant doesn't incur this overhead before calling runtime.chansend1
.
As a supplement to the good analysis of Martin Gallagher, it must be added that the way you are measuring is suspicious. The performance of such tiny programs varies a lot, so measuring should be done repeatedly. There are also some mistakes in your example.
First: it doesn't compile because the package statement is missing.
Second: there is an important difference between Nanoseconds
and Nanosecond
I tried to evaluate your observation this way*:
package main
import (
"time"
"fmt"
)
const (
chan_size = 1000
cycle_count = 1000
)
var (
v_ch = make(chan t, chan_size)
p_ch = make(chan *t, chan_size)
)
type t struct {
a uint
b uint
}
func fill_v() {
for i := 0; i < chan_size; i++ {
b := t{a:3, b:5}
v_ch <- b
}
}
func fill_p() {
for i := 0; i < chan_size; i++ {
b := t{a:3, b:5}
p_ch <- &b
}
}
func measure_f(f func()) int64 {
start := time.Now()
f();
elapsed := time.Since(start)
return elapsed.Nanoseconds()
}
func main() {
var v_nanos int64 = 0
var p_nanos int64 = 0
for i := 0; i<cycle_count; i++ {
v_nanos += measure_f(fill_v);
for i := 0; i < chan_size; i++ {
_ = <- v_ch
}
}
for i := 0; i<cycle_count; i++ {
p_nanos += measure_f(fill_p);
for i := 0; i < chan_size; i++ {
_ = <- p_ch
}
}
fmt.Println(
"v:",v_nanos/cycle_count,
" p:", p_nanos/cycle_count,
"ratio (v/p):", float64(v_nanos)/float64(p_nanos))
}
There is indeed a measurable performance drop (I define drop like this drop=1-(candidate/optimum)
), but although I repeat the code 1000 times, it varies between 25% and 50%, I'm not even sure how the heap is recycled and when, so it maybe hard to quantify at all.
*see a "running" demo on ideone
...note that stdout is frozen: v: 34875 p: 59420 ratio (v/p)0.586923845267128
For some reason, it was not possible to run this code in the Go Playground
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With