<p>I heard that being conscious of type-stability contributes a lot to the high performance in Julia programming, so I tried to measure how much time I can save when rewriting the type-unstable function into type-stable version. As many people say, I assumed that type-stable coding of course has higher performance than type-unstable one. However, the result was otherwise:</p> <pre class="prettyprint"><code># type-unstable vs type-stable # type-unstable function positive(x) if x < 0 return 0.0 else return x end end # type-stable function positive_safe(x) if x < 0 return zero(x) else return x end end @time for n in 1:100_000_000 a = 2^( positive(-n) + 1 ) end @time for n in 1:100_000_000 b = 2^( positive_safe(-n) + 1 ) end </code></pre> <p>result:</p> <pre class="prettyprint"><code>0.040080 seconds 0.150596 seconds </code></pre> <p>I cannot believe this. Are there some mistakes in my code? Or this is the fact?</p> <p>Any information would be appreciated.</p> <h3>Context</h3> <ul> <li>Operating System and version: Windows 10</li> <li>Browser and version: Google Chrome 90.0.4430.212（Official Build）（64 bit)</li> <li>JupyterLab version: 3.0.14</li> </ul> <h3>@btime result</h3> <h3>Just replacing @time with @btime for my code above</h3> <pre class="prettyprint"><code>@btime for n in 1:100_000_000 a = 2^( positive(-n) + 1 ) end # -> 1.500 ns @btime for n in 1:100_000_000 b = 2^( positive_safe(-n) + 1 ) end # -> 503.146 ms </code></pre> <p>Still weird.</p> <h3>the exact same code DNF showed me</h3> <pre class="prettyprint"><code>using BenchmarkTools @btime 2^(positive(-n) + 1) setup=(n=rand(1:10^8)) # -> 32.435 ns (0 allocations: 0 bytes) @btime 2^(positive_safe(-n) + 1) setup=(n=rand(1:10^8)) #-> 3.103 ns (0 allocations: 0 bytes) </code></pre> <p>Works as expected.</p> <p>I still don't understand what is happening. I feel like I have to know better about the usage of <code>@btime</code> and benchmarking process.</p> <p>By the way, as I said above, I'm trying this benchmarking on Jupyterlab.</p>

<p>This is benchmarking problem. The <code>@time</code> macro is not suitable for microbenchmarks. Use the BenchmarkTools.jl package, and read the user manual. It is easy to make mistakes when benchmarking.</p> <p>Here's how to do it:</p> <pre class="prettyprint"><code>jl> using BenchmarkTools jl> @btime 2^(positive(-n) + 1) setup=(n=rand(1:10^8)) 6.507 ns (0 allocations: 0 bytes) 2.0 jl> @btime 2^(positive_safe(-n) + 1) setup=(n=rand(1:10^8)) 2.100 ns (0 allocations: 0 bytes) 2 </code></pre> <p>As you see, the type stable function is faster.</p>

A fast coding tip somehow ended up making the code slower in Julia

Tags:

types

optimization

julia

I heard that being conscious of type-stability contributes a lot to the high performance in Julia programming, so I tried to measure how much time I can save when rewriting the type-unstable function into type-stable version. As many people say, I assumed that type-stable coding of course has higher performance than type-unstable one. However, the result was otherwise:

# type-unstable vs type-stable

#　type-unstable
function positive(x)
    if x < 0
        return 0.0
    else
        return x
    end
end

# type-stable
function positive_safe(x)
    if x < 0
        return zero(x)
    else
        return x
    end
end

@time for n in 1:100_000_000
    a = 2^( positive(-n) + 1 )
end

@time for n in 1:100_000_000
    b = 2^( positive_safe(-n) + 1 )
end

result:

0.040080 seconds
0.150596 seconds

I cannot believe this. Are there some mistakes in my code? Or this is the fact?

Any information would be appreciated.

Context

Operating System and version: Windows 10
Browser and version: Google Chrome 90.0.4430.212（Official Build）（64 bit)
JupyterLab version: 3.0.14

@btime result

Just replacing @time with @btime for my code above

@btime for n in 1:100_000_000
    a = 2^( positive(-n) + 1 )
end
# -> 1.500 ns 

@btime for n in 1:100_000_000
    b = 2^( positive_safe(-n) + 1 )
end
# -> 503.146 ms

Still weird.

the exact same code DNF showed me

using BenchmarkTools

@btime 2^(positive(-n) + 1) setup=(n=rand(1:10^8))
# -> 32.435 ns (0 allocations: 0 bytes)
@btime 2^(positive_safe(-n) + 1) setup=(n=rand(1:10^8))

#-> 3.103 ns (0 allocations: 0 bytes)

Works as expected.

I still don't understand what is happening. I feel like I have to know better about the usage of @btime and benchmarking process.

By the way, as I said above, I'm trying this benchmarking on Jupyterlab.

402

asked Jun 04 '21 10:06

ten

2 Answers

The problem with your benchmark, you testing different logic code:

2 ^ (integer value) and 2 ^ (float value)

But the most crucial part, if a and b is not defined before the loop, Julia compiler may remove the block. Your performance very much depends was the a and b defined before and were defined in the global scope or not.

And power is the time-consuming central part of your code (not the type unstable part).

positive function returns Float in your case, positive_safe returns Int)

The code similar to your case (by logic) could look like that:

# type-unstable

function positive(x)
    if x < 0
        return 0.0
    else
        return x
    end
end

# type-stable
function positive_safe(x)
    if x < 0
        return 0.0
    else
        return Float64(x)
    end
end

function test1()
    a = 0.0
    for n in 1:100_000_000
        a += 2^( positive(-n) + 1 )
    end
    a
end

function test2()
    b = 0.0
    for n in 1:100_000_000
        b += 2^( positive_safe(-n) + 1 )
    end
    b
end

@btime test1()
@btime test2()

98.045 ms (0 allocations: 0 bytes)
2.0e8

  97.948 ms (0 allocations: 0 bytes)
2.0e8

The results are almost the same since your type unstable is not a bottleneck for the case.

If to test the function (which is similar to your case when a/b was not defined):

function test3()
    b = 0.0
    for n in 1:100_000_000
        b += 2^( positive_safe(-n) + 1 )
    end
    nothing
end

@btime test3()

Benchmark will show results:

1.611 ns

This is not because my laptop did 100_000_000 iterations per 1.611 ns, but because Julia compiler smart enough to understand that the test3 function may be replaced with nothing.

157

answered Oct 16 '22 18:10

Vitaliy Yakovchuk

This is benchmarking problem. The @time macro is not suitable for microbenchmarks. Use the BenchmarkTools.jl package, and read the user manual. It is easy to make mistakes when benchmarking.

Here's how to do it:

jl> using BenchmarkTools

jl> @btime 2^(positive(-n) + 1) setup=(n=rand(1:10^8))
  6.507 ns (0 allocations: 0 bytes)
2.0

jl> @btime 2^(positive_safe(-n) + 1) setup=(n=rand(1:10^8))
  2.100 ns (0 allocations: 0 bytes)
2

As you see, the type stable function is faster.