Julia Memory Allocation for Addition of Two Matrices in place

Question

I'm curious why Julias implementation of matrix addition appears to make a copy. Heres an example:

foo1=rand(1000,1000)
foo2=rand(1000,1000)
foo3=rand(1000,1000)

julia> @time foo1=foo2+foo3;
  0.001719 seconds (9 allocations: 7.630 MB)
julia> sizeof(foo1)/10^6
8.0

The amount of memory allocated is roughly the same as the memory required by a matrix of these dimensions.

It looks like in order to process foo2+foo3 memory is allocated to store the result and then foo1 is assigned to it by reference.

Does this mean that for most linear algebra operations we need to call BLAS and LAPACK functions directly to do things in place?

spencerlyon2 · Accepted Answer

To understand what is going on here, let's consider what foo1 = foo2 + foo3 actually does.

First it evaluates foo2 + foo3. To do this it will allocate a new temporary array to hold the output
Then it will bind the name foo1 to this new temporary array, undoing all effort you put in to pre-allocate the output array.

In short, you see that memory usage is about that of the resultant array because the routine is indeed allocating new memory for an array of that size.

Here are some alternatives:

write a loop
use broadcast!
We could try do do copy!(foo1, foo2+foo3) and then the array you pre-allocated will be filled, but it will still allocate the temporary (see below)
The original version posted here

Here's some code for those 4 cases

julia> function with_loop!(foo1, foo2, foo3)
           for i in eachindex(foo2)
              foo1[i] = foo2[i] + foo3[i]
           end
       end

julia> function with_broadcast!(foo1, foo2, foo3)
           broadcast!(+, foo1, foo2, foo3)
       end

julia> function with_copy!(foo1, foo2, foo3)
           copy!(foo1, foo2+foo3)
       end

julia> function original(foo1, foo2, foo3)
           foo1 = foo2 + foo3
       end

Now let's time these functions

julia> for f in [:with_broadcast!, :with_loop!, :with_copy!, :original]
           @eval $f(foo1, foo2, foo3) # compile
           println("timing $f")
           @eval @time $f(foo1, foo2, foo3)
       end
timing with_broadcast!
  0.001787 seconds (5 allocations: 192 bytes)
timing with_loop!
  0.001783 seconds (4 allocations: 160 bytes)
timing with_copy!
  0.003604 seconds (9 allocations: 7.630 MB)
timing original
  0.002702 seconds (9 allocations: 7.630 MB, 97.91% gc time)

You can see that with_loop! and broadcast! do about the same and both are much faster and more efficient than the others. with_copy! and original are both slower and use more memory.

In general, to do inplace operations I'd recommend starting out by writing a loop

Dan Getz · Answer

First, read @spencerlyon2's answer. Another approach is to use Dahua Lin's package Devectorize.jl. It defines the @devec macro which automates the translations of vector (array) expressions into looped code.

In this example we will define with_devec!(foo1,foo2,foo3) as follows,

julia> using Devectorize      # install with Pkg.add("Devectorize")

julia> with_devec!(foo1,foo2,foo3) = @devec foo1[:]=foo2+foo3

Running the benchmark achieves the 4 allocations results.

Julia Memory Allocation for Addition of Two Matrices in place

Tags:

in-place

lapack

blas

julia

Lindon

2 Answers

spencerlyon2

Dan Getz

Recent Activity

Donate For Us

Julia Memory Allocation for Addition of Two Matrices in place

Tags:

in-place

lapack

blas

julia

Lindon

2 Answers

spencerlyon2

Dan Getz

Related questions

Recent Activity

Donate For Us