The BenchmarkTools documentation recommends interpolating global variables into benchmarking expressions. However, the gap in run times for the example that they provide seems to have closed considerably. In their example, they have a global variable A = rand(1000)
, and they compare @benchmark [i*i for i in A]
to @benchmark [i*i for i in $A]
, and get 13.806 μs
versus 1.348 μs
, respectively. However, when I run that example now, the run times are very close:
julia> using Statistics, BenchmarkTools
julia> A = rand(1000);
julia> median(@benchmark [i*i for i in A])
BenchmarkTools.TrialEstimate:
time: 892.821 ns
gctime: 0.000 ns (0.00%)
memory: 7.95 KiB
allocs: 2
julia> median(@benchmark [i*i for i in $A])
BenchmarkTools.TrialEstimate:
time: 836.075 ns
gctime: 0.000 ns (0.00%)
memory: 7.95 KiB
allocs: 2
Here's my version info:
julia> versioninfo()
Julia Version 1.1.1
Commit 55e36cc (2019-05-16 04:10 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin15.6.0)
CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Is interpolation in benchmarks still necessary? Any idea why the run times are so similar now? Can anyone provide a different example where the run times are different by a factor much greater than one?
BenchmarkTools is in an arms race against the compiler — on multiple fronts!
The difference between the two expressions is equivalent to the difference between these two functions:
# @benchmark [i*i for i in A]
f1() = [i*i for i in A]
# @benchmark [i*i for i in $A]
f2(X) = [i*i for i in X]
In other words, using a $
treats the value as an argument instead of a hard-coded constant or global. Since A
is a global but not constant, f1()
is type-unstable. Of course Julia has been getting better and better at dealing with type-instabilities and it appears that this is yet another place where you're no longer paying the cost for it.
There are times where not using a $
will actually give deceivingly fast results because Julia will hard-code the value and may do some sort of constant propagation that over-specializes on the exact value you're benchmarking. Here's an example that shows both directions on
julia> x = 0.5; # non-constant global
julia> @btime sin(x);
20.106 ns (1 allocation: 16 bytes)
julia> @btime sin($x);
5.413 ns (0 allocations: 0 bytes)
julia> @btime sin(0.5); # constant literal!
1.818 ns (0 allocations: 0 bytes)
julia> @btime sin($0.5);
5.416 ns (0 allocations: 0 bytes)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With