Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List of vectors slower in Julia than R?

I tried to speed up an R function by porting it to Julia, but to my surprise Julia was slower. The function sequentially updates a list of vectors (array of arrays in Julia). Beforehand the index of the list element to be updated is unknown and the length of the new vector is unknown. I have written a test function that demonstrates the behavior.

Julia

function MyTest(n)
  a = [[0.0] for i in 1:n]
    for i in 1:n
      a[i] = cumsum(ones(i))
    end  
  a
end

R

MyTest <- function(n){
  a <- as.list(rep(0, n))
  for (i in 1:n) 
    a[[i]] <- cumsum(rep(1, i))
  a
}

By setting n to 5000, 10000 and 20000, typical computing times are (median of 21 tests):

  • R: 0.14, 0.45, and 1.28 seconds
  • Julia: 0.31, 3.38, and 27.03 seconds

I used a windows-laptop with 64 bit Julia-1.3.1 and 64 bit R-3.6.1.

Both these functions use 64 bit floating-point types. My real problem involves integers and then R is even more favorable. But integer comparison isn’t fair since R uses 32 bit integers and Julia 64 bit. Is it something I can do to speed up Julia or is really Julia much slower than R in this case?

like image 603
Øyvind Langsrud Avatar asked Jan 10 '20 10:01

Øyvind Langsrud


1 Answers

I don't quite see how you get your test results. Assuming you want 32 bit integers, as you said, then we have

julia> function mytest(n)
           a = Vector{Vector{Int32}}(undef, n)
           for i in 1:n
               a[i] = cumsum(ones(i))
           end

           return a
       end
mytest (generic function with 1 method)

julia> @btime mytest(20000);
  1.108 s (111810 allocations: 3.73 GiB)

When we only get rid of those allocations, we already get down to the following:

julia> function mytest(n)
           a = Vector{Vector{Int32}}(undef, n)
           @inbounds for i in 1:n
               a[i] = collect(UnitRange{Int32}(1, i))
           end

           return a
       end
mytest (generic function with 1 method)

julia> @btime mytest(20000);
  115.702 ms (35906 allocations: 765.40 MiB)

Further devectorization does not even help:

julia> function mytest(n)
           a = Vector{Vector{Int32}}(undef, n)
           @inbounds for i in 1:n
               v = Vector{Int32}(undef, i)
               v[1] = 1
               @inbounds for j = 2:i
                   v[j] = v[j-1] + 1
               end
               a[i] = v
           end

           return a
       end
mytest (generic function with 1 method)

julia> @btime mytest(20000);
  188.856 ms (35906 allocations: 765.40 MiB)

But with a couple of threads (I assume the inner arrays are independent), we get 2x speed-up again:

julia> Threads.nthreads()
4

julia> function mytest(n)
           a = Vector{Vector{Int32}}(undef, n)
           Threads.@threads for i in 1:n
               v = Vector{Int32}(undef, i)
               v[1] = 1
               @inbounds for j = 2:i
                   v[j] = v[j-1] + 1
               end
               a[i] = v
           end

           return a
       end
mytest (generic function with 1 method)

julia> @btime mytest(20000);
  99.718 ms (35891 allocations: 763.13 MiB)

But this is only about as fast as the second variant above.

That is, for the specific case of cumsum. Other inner functions are slower, of course, but can be equally threaded, and optimized in the same ways, with possibly different results.

(This is on Julia 1.2, 12 GiB RAM, and an older i7.)

like image 99
phipsgabler Avatar answered Nov 12 '22 06:11

phipsgabler