I'm curious why Julias implementation of matrix addition appears to make a copy. Heres an example:
foo1=rand(1000,1000)
foo2=rand(1000,1000)
foo3=rand(1000,1000)
julia> @time foo1=foo2+foo3;
0.001719 seconds (9 allocations: 7.630 MB)
julia> sizeof(foo1)/10^6
8.0
The amount of memory allocated is roughly the same as the memory required by a matrix of these dimensions.
It looks like in order to process foo2+foo3 memory is allocated to store the result and then foo1 is assigned to it by reference.
Does this mean that for most linear algebra operations we need to call BLAS and LAPACK functions directly to do things in place?
To understand what is going on here, let's consider what foo1 = foo2 + foo3
actually does.
foo2 + foo3
. To do this it will allocate a new temporary array to hold the outputfoo1
to this new temporary array, undoing all effort you put in to pre-allocate the output array. In short, you see that memory usage is about that of the resultant array because the routine is indeed allocating new memory for an array of that size.
Here are some alternatives:
broadcast!
copy!(foo1, foo2+foo3)
and then the array you pre-allocated will be filled, but it will still allocate the temporary (see below)Here's some code for those 4 cases
julia> function with_loop!(foo1, foo2, foo3)
for i in eachindex(foo2)
foo1[i] = foo2[i] + foo3[i]
end
end
julia> function with_broadcast!(foo1, foo2, foo3)
broadcast!(+, foo1, foo2, foo3)
end
julia> function with_copy!(foo1, foo2, foo3)
copy!(foo1, foo2+foo3)
end
julia> function original(foo1, foo2, foo3)
foo1 = foo2 + foo3
end
Now let's time these functions
julia> for f in [:with_broadcast!, :with_loop!, :with_copy!, :original]
@eval $f(foo1, foo2, foo3) # compile
println("timing $f")
@eval @time $f(foo1, foo2, foo3)
end
timing with_broadcast!
0.001787 seconds (5 allocations: 192 bytes)
timing with_loop!
0.001783 seconds (4 allocations: 160 bytes)
timing with_copy!
0.003604 seconds (9 allocations: 7.630 MB)
timing original
0.002702 seconds (9 allocations: 7.630 MB, 97.91% gc time)
You can see that with_loop!
and broadcast!
do about the same and both are much faster and more efficient than the others. with_copy!
and original
are both slower and use more memory.
In general, to do inplace operations I'd recommend starting out by writing a loop
First, read @spencerlyon2's answer. Another approach is to use Dahua Lin's package Devectorize.jl
. It defines the @devec
macro which automates the translations of vector (array) expressions into looped code.
In this example we will define with_devec!(foo1,foo2,foo3)
as follows,
julia> using Devectorize # install with Pkg.add("Devectorize")
julia> with_devec!(foo1,foo2,foo3) = @devec foo1[:]=foo2+foo3
Running the benchmark achieves the 4 allocations results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With