Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoid memory allocation when indexing an array in Julia

Tags:

julia

UPDATE: Note that the relevant function in Julia v1+ is view

Question: I would like to index into an array without triggering memory allocation, especially when passing the indexed elements into a function. From reading the Julia docs, I suspect the answer revolves around using the sub function, but can't quite see how...

Working Example: I build a large vector of Float64 (x) and then an index to every observation in x.

N = 10000000
x = randn(N)
inds = [1:N]

Now I time the mean function over x and x[inds] (I run mean(randn(2)) first to avoid any compiler irregularities in the timing):

@time mean(x)
@time mean(x[inds])

It's an identical calculation, but as expected the results of the timings are:

elapsed time: 0.007029772 seconds (96 bytes allocated)
elapsed time: 0.067880112 seconds (80000208 bytes allocated, 35.38% gc time)

So, is there a way around the memory allocation problem for arbitrary choices of inds (and arbitrary choice of array and function)?

like image 971
Colin T Bowers Avatar asked Feb 02 '15 04:02

Colin T Bowers


1 Answers

Just use xs = sub(x, 1:N). Note that this is different from x = sub(x, [1:N]); on julia 0.3 the latter will fail, and on julia 0.4-pre the latter will be considerably slower than the former. On julia 0.4-pre, sub(x, 1:N) is just as fast as view:

julia> N = 10000000;

julia> x = randn(N);

julia> xs = sub(x, 1:N);

julia> using ArrayViews

julia> xv = view(x, 1:N);

julia> mean(x)
-0.0002491126429772525

julia> mean(xs)
-0.0002491126429772525

julia> mean(xv)
-0.0002491126429772525

julia> @time mean(x);
elapsed time: 0.015345806 seconds (27 kB allocated)

julia> @time mean(xs);
elapsed time: 0.013815785 seconds (96 bytes allocated)

julia> @time mean(xv);
elapsed time: 0.015871052 seconds (96 bytes allocated)

There are several reasons why sub(x, inds) is slower than sub(x, 1:N):

  • Each access xs[i] corresponds to x[inds[i]]; we have to look up two memory locations rather than one
  • If inds is not in order, you will get poor cache behavior in accessing the elements of x
  • It destroys the ability to use SIMD vectorization

In this case, the latter is probably the most important effect. This is not a Julia limitation; the same thing would happen were you to write the equivalent code in C, Fortran, or assembly.

Note that it's still faster to say sum(sub(x, inds)) than sum(x[inds]), (until the latter becomes the former, which it should by the time julia 0.4 is officially out). But if you have to do many operations with xs = sub(x, inds), in some circumstances it will be worth your while to make a copy, even though it allocates memory, just so you can take advantage of the optimizations possible when values are stored in contiguous memory.

like image 115
tholy Avatar answered Sep 18 '22 11:09

tholy