Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

when is vectorization favored in Julia?

I have 2 functions for determining pi numerically in Julia. The second function (which I think is vectorized) is slower than the first. Why is vectorization slower? Are there rules when to vectorize and when not to?

function determine_pi(n)
    area = zeros(Float64, n);
    sum = 0;
    for i=1:n
        if ((rand()^2+rand()^2) <=1)
            sum = sum + 1;
        end
            area[i] = sum*1.0/i;
    end
    return area
end

and another function

function determine_pi_vec(n)
    res = cumsum(map(x -> x<=1?1:0, rand(n).^2+rand(n).^2))./[1:n]
    return res
end

When run for n=10^7, below are execution times (after running few times)

n=10^7
@time returnArray = determine_pi(n)
#output elapsed time: 0.183211324 seconds (80000128 bytes allocated)
@time returnArray2 = determine_pi_vec(n);
#elapsed time: 2.436501454 seconds (880001336 bytes allocated, 30.71% gc time)
like image 541
VeryLucky XYZ Avatar asked Feb 21 '15 16:02

VeryLucky XYZ


1 Answers

Vectorization is good if

  • It makes the code easier to read, and performance isn't critical
  • If its a linear algebra operation, using a vectorized style can be good because Julia can use BLAS and LAPACK to perform your operation with very specialized, high-performance code.

In general, I personally find it best to start with vectorized code, look for any speed problems, then devectorize any troublesome issues.

Your second code is slow not so much due to it being vectorized, but due to the use of an anonymous function: unfortunately in Julia 0.3, these are normally quite a bit slower. map in general doesn't perform very well, I believe because Julia can't infer the output type of the function (its still "anonymous" from the perspective of the map function). I wrote a different vectorized version which avoids anonymous functions, and is possibly a little bit easier to read:

function determine_pi_vec2(n)
    return cumsum((rand(n).^2 .+ rand(n).^2) .<= 1) ./ (1:n)
end

Benchmarking with

function bench(n, f)
  f(10)
  srand(1000)
  @time f(n)
  srand(1000)
  @time f(n)
  srand(1000)
  @time f(n)
end

bench(10^8, determine_pi)
gc()
bench(10^8, determine_pi_vec)
gc()
bench(10^8, determine_pi_vec2)

gives me the results

elapsed time: 5.996090409 seconds (800000064 bytes allocated)
elapsed time: 6.028323688 seconds (800000064 bytes allocated)
elapsed time: 6.172004807 seconds (800000064 bytes allocated)
elapsed time: 14.09414031 seconds (8800005224 bytes allocated, 7.69% gc time)
elapsed time: 14.323797823 seconds (8800001272 bytes allocated, 8.61% gc time)
elapsed time: 14.048216404 seconds (8800001272 bytes allocated, 8.46% gc time)
elapsed time: 8.906563284 seconds (5612510776 bytes allocated, 3.21% gc time)
elapsed time: 8.939001114 seconds (5612506184 bytes allocated, 4.25% gc time)
elapsed time: 9.028656043 seconds (5612506184 bytes allocated, 4.23% gc time)

so vectorized code can definitely be about as good as devectorized in some cases, even when we aren't in a the linear-algebra case.

like image 86
IainDunning Avatar answered Nov 12 '22 08:11

IainDunning