Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does array += (without @.) produce so much memory allocation?

Tags:

julia

I don't understand why the += operation for arrays produces so much memory allocation, but it gets fixed when applying @.

function loop()
    a = randn(10)
    total = similar(a)

    for i=1:1000
        total += a
    end
end

function loopdot()
    a = randn(10)
    total = similar(a)

    for i=1:1000
        @. total += a
    end
end


loop()
loopdot()

Profile.clear_malloc_data()

loop()
loopdot()

produces

160000         total += a

and

0         @. total += a
like image 997
fratrik Avatar asked Jan 29 '23 23:01

fratrik


1 Answers

total += a is the same as total = a + total which is a vectorized operation like:

out = similar(a)
for i in eachindex(a)
  out[i] = total[i] + a[i]
end
total = out

since internally it's

total = +(total,a)

This is just like MATLAB, Python, or R and thus has a temporary array that is allocated for the vectorized operation, and then the = sets the reference of total to this new array. This is why vectorized operations are slow vs traditional low level loops and is one of the main reasons why using something like NumPy can be faster than Python but cannot fully reach C (because of these temporaries!).

@. total += a is the same as total .= total .+ a. This blog post explains that in Julia there is semantic dot fusion via anonymous functions, and thus corresponds to doing the following:

# Build an anonymous function for the fused operation
f! = (a,b,c) -> (a[i] = b[i] + c[i])
# Now loop it, since it's `.=` don't make a temporary
for i in eachindex(a)
  f!(total,total,a)
end

which updates total in-place without creating a temporary array.

Fusion in Julia happens semantically: this conversion of dotted operations into an anonymous function plus a broadcast! call (which is essentially the loop I wrote there) is done at parsing time, and the anonymous function is compiled so that way this is efficient. This is very useful for other reasons as well. By overloading broadcast! on a generic f!, this is how things like GPUArrays.jl automatically build efficient single kernels which do in-place updates on the GPU. This is as opposed to MATLAB, Python, and R where different vectorized functions are considered as different function calls and thus must compute a return, hence the temporary array.

like image 66
Chris Rackauckas Avatar answered Feb 22 '23 23:02

Chris Rackauckas