I need to add a scalar to all elements of a huge matrix. The matrix will be as big as possible. In the example I will use a size of 2 GiB but in my real computation it will be much larger.
A = rand(2^14, 2^14)
If I execute
A += 1.0
Julia allocates an additional 2 GiB of memory. The operation takes about 1s. I could use a for
loop:
for jj = 1:size(A, 2), ii = 1:size(A, 1)
A[ii, jj] = A[ii, jj] + 1.0
end
This does not allocate any memory, but it takes one minute. Both approaches are not viable for me, because the first one violates memory constraints and the second is clearly inefficient. For element-wise multiplication there is scal!
, which uses BLAS. Is there any way of performing addition as effciently as multiplication using scal!
?
@DSM's answer is a good one. There are a number of things going on here that I'd like to address in addition, however. The reason your for loop is slow is because A
is a non-constant global variable and your code is directly mutating that global. Since A
is non-constant, the code has to guard against the possibility of A
becoming a different value with a different type at any point during the execution of the loop. The code has to look up the type and location of A
on every iteration of the loop and dynamically dispatch the method calls in the expression A[ii, jj] = A[ii, jj] + 1.0
– that's a call to getindex
, +
and setindex!
, all of which depend on the statically unknown type of A
. You can immediately get much better performance just by doing this work in a function:
julia> A = rand(2^10, 2^10);
julia> @time for jj = 1:size(A, 2), ii = 1:size(A, 1)
A[ii, jj] += 1
end
elapsed time: 0.288340785 seconds (84048040 bytes allocated, 15.59% gc time)
julia> function inc!(A)
for jj = 1:size(A, 2), ii = 1:size(A, 1)
A[ii, jj] += 1
end
end
inc! (generic function with 1 method)
julia> @time inc!(A)
elapsed time: 0.006076414 seconds (171336 bytes allocated)
julia> @time inc!(A)
elapsed time: 0.000888457 seconds (80 bytes allocated)
Avoiding non-constant globals like this is the first recommendation in the Performance Tips section of the manual. You'll probably want to peruse the rest of this chapter as well.
We can further improve the performance of the inc!
function using the @inbounds
annotation to indicate that bounds checks aren't necessary for this code, and by using linear indexing instead of two-dimensional indexing:
julia> function inc!(A)
@inbounds for i = 1:length(A)
A[i] += 1
end
end
inc! (generic function with 1 method)
julia> @time inc!(A)
elapsed time: 0.000637934 seconds (80 bytes allocated)
Most of the speedup is from the @inbounds
annotation rather than the linear indexing, although that does give a little speed boost. The @inbounds
annotation should be used sparingly, however, and only where one is both certain that the indexing cannot be out of bounds and performance is of the utmost importance. As you can see, the additional performance improvement while existent, is not overwhelming. Most of the benefit comes from not directly mutating global variables.
You could do an in-place broadcast operation:
julia> A = rand(2^14, 2^14); A[1:5, 1:5]
5x5 Array{Float64,2}:
0.229662 0.680236 0.131202 0.111664 0.802698
0.500575 0.580994 0.385844 0.983806 0.324382
0.701694 0.577749 0.532591 0.0508955 0.94325
0.592929 0.00319653 0.759241 0.448704 0.706204
0.867945 0.0413606 0.586151 0.82561 0.679233
julia> @time broadcast!(.+, A, A, 100);
elapsed time: 0.382669486 seconds (11490976 bytes allocated)
julia> A[1:5, 1:5]
5x5 Array{Float64,2}:
100.23 100.68 100.131 100.112 100.803
100.501 100.581 100.386 100.984 100.324
100.702 100.578 100.533 100.051 100.943
100.593 100.003 100.759 100.449 100.706
100.868 100.041 100.586 100.826 100.679
which only uses a total of ~2G of memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With