I have 2 functions for determining pi numerically in Julia. The second function (which I think is vectorized) is slower than the first. Why is vectorization slower? Are there rules when to vectorize and when not to?
function determine_pi(n)
area = zeros(Float64, n);
sum = 0;
for i=1:n
if ((rand()^2+rand()^2) <=1)
sum = sum + 1;
end
area[i] = sum*1.0/i;
end
return area
end
and another function
function determine_pi_vec(n)
res = cumsum(map(x -> x<=1?1:0, rand(n).^2+rand(n).^2))./[1:n]
return res
end
When run for n=10^7, below are execution times (after running few times)
n=10^7
@time returnArray = determine_pi(n)
#output elapsed time: 0.183211324 seconds (80000128 bytes allocated)
@time returnArray2 = determine_pi_vec(n);
#elapsed time: 2.436501454 seconds (880001336 bytes allocated, 30.71% gc time)
Vectorization is good if
In general, I personally find it best to start with vectorized code, look for any speed problems, then devectorize any troublesome issues.
Your second code is slow not so much due to it being vectorized, but due to the use of an anonymous function: unfortunately in Julia 0.3, these are normally quite a bit slower. map
in general doesn't perform very well, I believe because Julia can't infer the output type of the function (its still "anonymous" from the perspective of the map
function). I wrote a different vectorized version which avoids anonymous functions, and is possibly a little bit easier to read:
function determine_pi_vec2(n)
return cumsum((rand(n).^2 .+ rand(n).^2) .<= 1) ./ (1:n)
end
Benchmarking with
function bench(n, f)
f(10)
srand(1000)
@time f(n)
srand(1000)
@time f(n)
srand(1000)
@time f(n)
end
bench(10^8, determine_pi)
gc()
bench(10^8, determine_pi_vec)
gc()
bench(10^8, determine_pi_vec2)
gives me the results
elapsed time: 5.996090409 seconds (800000064 bytes allocated)
elapsed time: 6.028323688 seconds (800000064 bytes allocated)
elapsed time: 6.172004807 seconds (800000064 bytes allocated)
elapsed time: 14.09414031 seconds (8800005224 bytes allocated, 7.69% gc time)
elapsed time: 14.323797823 seconds (8800001272 bytes allocated, 8.61% gc time)
elapsed time: 14.048216404 seconds (8800001272 bytes allocated, 8.46% gc time)
elapsed time: 8.906563284 seconds (5612510776 bytes allocated, 3.21% gc time)
elapsed time: 8.939001114 seconds (5612506184 bytes allocated, 4.25% gc time)
elapsed time: 9.028656043 seconds (5612506184 bytes allocated, 4.23% gc time)
so vectorized code can definitely be about as good as devectorized in some cases, even when we aren't in a the linear-algebra case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With