I am trying to minimize memory allocations in Julia by pre-allocating arrays as shown in the documentation. My sample code looks as follows:
using BenchmarkTools
dim1 = 100
dim2 = 1000
A = rand(dim1,dim2)
B = rand(dim1,dim2)
C = rand(dim1,dim2)
D = rand(dim1,dim2)
M = Array{Float64}(undef,dim1,dim2)
function calc!(a, b, c, d, E)
@. E = a * b * ((d-c)/d)
nothing
end
function run_calc(A,B,C,D,M)
for i in 1:dim2
@views calc!(A[:,i], B[:,i], C[:,i], D[:,i], M[:,i])
end
end
My understanding is that this should essentially not allocate since M
is pre-allocated outside the either of the two functions. However, when I benchmark this I still see a lot of allocations:
@btime run_calc(A,B,C,D,M)
1.209 ms (14424 allocations: 397.27 KiB)
In this case I can of course run the much more concise
@btime @. M = A * B * ((D-C)/D)
which performs very few allocations as expected:
122.599 μs (6 allocations: 144 bytes)
However my actual code is more complex and cannot be reduced like this, hence I am wondering where I am going wrong with the first version.
You are not doing anything wrong. Currently creation of views in Julia is allocating (as Stefan noted it has gotten much better than in the past, but still some allocations seem to happen in this case). The allocations you see are a consequence of this.
See:
julia> @allocated view(M, 1:10, 1:10)
64
Your case is one of the situations where it is simplest to just write an appropriate loop (I assume that in your code the loop will be more complex but I hope the intent is clear), e.g.:
julia> function run_calc2(A,B,C,D,M)
@inbounds for i in eachindex(A,B,C,D,M)
M[i] = A[i] * B[i] * ((D[i] - C[i])/D[i])
end
end
run_calc2 (generic function with 1 method)
julia> @btime run_calc2($A,$B,$C,$D,$M)
56.441 μs (0 allocations: 0 bytes)
julia> @btime run_calc($A,$B,$C,$D,$M)
893.789 μs (14424 allocations: 397.27 KiB)
julia> @btime @. $M = $A * $B * (($D-$C)/$D);
381.745 μs (0 allocations: 0 bytes)
EDIT: all timings on Julia Version 1.6.0-DEV.1580
EDIT2: for completeness a code that passes @views
down to the inner function. It still allocates (but is better) and is still slower than using just the loop:
julia> function calc2!(a, b, c, d, E, i)
@inbounds @. @views E[:,i] = a[:,i] * b[:,i] * ((d[:,i]-c[:,i])/d[:,i])
nothing
end
calc2! (generic function with 1 method)
julia> function run_calc3(A,B,C,D,M)
for i in 1:dim2
calc2!(A,B,C,D,M,i)
end
end
run_calc3 (generic function with 1 method)
julia> @btime run_calc3($A,$B,$C,$D,$M);
305.709 μs (1979 allocations: 46.56 KiB)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With