Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding a scalar to a matrix efficiently in Julia

Tags:

math

matrix

julia

I need to add a scalar to all elements of a huge matrix. The matrix will be as big as possible. In the example I will use a size of 2 GiB but in my real computation it will be much larger.

A = rand(2^14, 2^14)

If I execute

A += 1.0

Julia allocates an additional 2 GiB of memory. The operation takes about 1s. I could use a for loop:

for jj = 1:size(A, 2), ii = 1:size(A, 1)
  A[ii, jj] = A[ii, jj] + 1.0
end

This does not allocate any memory, but it takes one minute. Both approaches are not viable for me, because the first one violates memory constraints and the second is clearly inefficient. For element-wise multiplication there is scal!, which uses BLAS. Is there any way of performing addition as effciently as multiplication using scal!?

like image 458
Diphtong Avatar asked Dec 01 '22 15:12

Diphtong


2 Answers

@DSM's answer is a good one. There are a number of things going on here that I'd like to address in addition, however. The reason your for loop is slow is because A is a non-constant global variable and your code is directly mutating that global. Since A is non-constant, the code has to guard against the possibility of A becoming a different value with a different type at any point during the execution of the loop. The code has to look up the type and location of A on every iteration of the loop and dynamically dispatch the method calls in the expression A[ii, jj] = A[ii, jj] + 1.0 – that's a call to getindex, + and setindex!, all of which depend on the statically unknown type of A. You can immediately get much better performance just by doing this work in a function:

julia> A = rand(2^10, 2^10);

julia> @time for jj = 1:size(A, 2), ii = 1:size(A, 1)
           A[ii, jj] += 1
       end
elapsed time: 0.288340785 seconds (84048040 bytes allocated, 15.59% gc time)

julia> function inc!(A)
           for jj = 1:size(A, 2), ii = 1:size(A, 1)
               A[ii, jj] += 1
           end
       end
inc! (generic function with 1 method)

julia> @time inc!(A)
elapsed time: 0.006076414 seconds (171336 bytes allocated)

julia> @time inc!(A)
elapsed time: 0.000888457 seconds (80 bytes allocated)

Avoiding non-constant globals like this is the first recommendation in the Performance Tips section of the manual. You'll probably want to peruse the rest of this chapter as well.

We can further improve the performance of the inc! function using the @inbounds annotation to indicate that bounds checks aren't necessary for this code, and by using linear indexing instead of two-dimensional indexing:

julia> function inc!(A)
           @inbounds for i = 1:length(A)
               A[i] += 1
           end
       end
inc! (generic function with 1 method)

julia> @time inc!(A)
elapsed time: 0.000637934 seconds (80 bytes allocated)

Most of the speedup is from the @inbounds annotation rather than the linear indexing, although that does give a little speed boost. The @inbounds annotation should be used sparingly, however, and only where one is both certain that the indexing cannot be out of bounds and performance is of the utmost importance. As you can see, the additional performance improvement while existent, is not overwhelming. Most of the benefit comes from not directly mutating global variables.

like image 150
StefanKarpinski Avatar answered Dec 15 '22 14:12

StefanKarpinski


You could do an in-place broadcast operation:

julia> A = rand(2^14, 2^14); A[1:5, 1:5]
5x5 Array{Float64,2}:
 0.229662  0.680236    0.131202  0.111664   0.802698
 0.500575  0.580994    0.385844  0.983806   0.324382
 0.701694  0.577749    0.532591  0.0508955  0.94325 
 0.592929  0.00319653  0.759241  0.448704   0.706204
 0.867945  0.0413606   0.586151  0.82561    0.679233

julia> @time broadcast!(.+, A, A, 100);
elapsed time: 0.382669486 seconds (11490976 bytes allocated)

julia> A[1:5, 1:5]
5x5 Array{Float64,2}:
 100.23   100.68   100.131  100.112  100.803
 100.501  100.581  100.386  100.984  100.324
 100.702  100.578  100.533  100.051  100.943
 100.593  100.003  100.759  100.449  100.706
 100.868  100.041  100.586  100.826  100.679

which only uses a total of ~2G of memory.

like image 43
DSM Avatar answered Dec 15 '22 13:12

DSM