At the moment I am implementing some sorting algorithms. As it's in the nature of algorithms, there are a lot of calls on the length of some arrays/slices using the len()
method.
Now, given the following code for a (part of) the Mergesort algorithm:
for len(left) > 0 || len(right) > 0 {
if len(left) > 0 && len(right) > 0 {
if left[0] <= right[0] {
result = append(result, left[0])
left = left[1:len(left)]
} else {
result = append(result, right[0])
right = right[1:len(right)]
}
} else if len(left) > 0 {
result = append(result, left[0])
left = left[1:len(left)]
} else if len(right) > 0 {
result = append(result, right[0])
right = right[1:len(right)]
}
}
My question is: Do these multiple len() calls affect the performance of the algorithm negatively? Is it better to make a temporary variable for the length of the right
and left
slice? Or does the compiler does this itself?
There are two cases:
For locally defined slices the length is cached, so there is no runtime overhead. You can see this in the assembly of the following program:
func generateSlice(x int) []int {
return make([]int, x)
}
func main() {
x := generateSlice(10)
println(len(x))
}
Compiled with go tool 6g -S test.go
this yields, amongst other things, the following lines:
MOVQ "".x+40(SP),BX
MOVQ BX,(SP)
// ...
CALL ,runtime.printint(SB)
What happens here is that the first line retrieves the length of x
by getting the value located 40 bytes from the beginning of x
and most importantly caches this value in BX
, which is then used for every occurrence of len(x)
. The reason for the offset is that an array has the following structure (source):
typedef struct
{ // must not move anything
uchar array[8]; // pointer to data
uchar nel[4]; // number of elements
uchar cap[4]; // allocated number of elements
} Array;
nel
is what is accessed by len()
. You can see this in the code generation as well.
For shared values caching of the length is not possible since the compiler has to assume that the slice changes between calls. Therefore the compiler has to write code that accesses the length attribute directly every time. Example:
func accessLocal() int {
a := make([]int, 1000) // local
count := 0
for i := 0; i < len(a); i++ {
count += len(a)
}
return count
}
var ag = make([]int, 1000) // pseudo-code
func accessGlobal() int {
count := 0
for i := 0; i < len(ag); i++ {
count += len(ag)
}
return count
}
Comparing the assembly of both functions yields the crucial difference that as soon as the variable is global the access to the nel
attribute is not cached anymore and there will be a runtime overhead:
// accessLocal
MOVQ "".a+8048(SP),SI // cache length in SI
// ...
CMPQ SI,AX // i < len(a)
// ...
MOVQ SI,BX
ADDQ CX,BX
MOVQ BX,CX // count += len(a)
// accessGlobal
MOVQ "".ag+8(SB),BX
CMPQ BX,AX // i < len(ag)
// ...
MOVQ "".ag+8(SB),BX
ADDQ CX,BX
MOVQ BX,CX // count += len(ag)
Despite the good answers you are getting, I'm getting poorer performance if calling len(a) constantly, for example in this test http://play.golang.org/p/fiP1Sy2Hfk
package main
import "testing"
func BenchmarkTest1(b *testing.B) {
a := make([]int, 1000)
for i := 0; i < b.N; i++ {
count := 0
for i := 0; i < len(a); i++ {
count += len(a)
}
}
}
func BenchmarkTest2(b *testing.B) {
a := make([]int, 1000)
for i := 0; i < b.N; i++ {
count := 0
lena := len(a)
for i := 0; i < lena; i++ {
count += lena
}
}
}
When run as go test -bench=.
I get:
BenchmarkTest1 5000000 668 ns/op
BenchmarkTest2 5000000 402 ns/op
So there is clearly a penalty here, possibly because the compiler is making worse optimizations in compile-time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With