UPDATE: Note that the relevant function in Julia v1+ is view
Question: I would like to index into an array without triggering memory allocation, especially when passing the indexed elements into a function. From reading the Julia docs, I suspect the answer revolves around using the sub
function, but can't quite see how...
Working Example: I build a large vector of Float64
(x
) and then an index to every observation in x
.
N = 10000000
x = randn(N)
inds = [1:N]
Now I time the mean
function over x
and x[inds]
(I run mean(randn(2))
first to avoid any compiler irregularities in the timing):
@time mean(x)
@time mean(x[inds])
It's an identical calculation, but as expected the results of the timings are:
elapsed time: 0.007029772 seconds (96 bytes allocated)
elapsed time: 0.067880112 seconds (80000208 bytes allocated, 35.38% gc time)
So, is there a way around the memory allocation problem for arbitrary choices of inds
(and arbitrary choice of array and function)?
Just use xs = sub(x, 1:N)
. Note that this is different from x = sub(x, [1:N])
; on julia 0.3 the latter will fail, and on julia 0.4-pre the latter will be considerably slower than the former. On julia 0.4-pre, sub(x, 1:N)
is just as fast as view
:
julia> N = 10000000;
julia> x = randn(N);
julia> xs = sub(x, 1:N);
julia> using ArrayViews
julia> xv = view(x, 1:N);
julia> mean(x)
-0.0002491126429772525
julia> mean(xs)
-0.0002491126429772525
julia> mean(xv)
-0.0002491126429772525
julia> @time mean(x);
elapsed time: 0.015345806 seconds (27 kB allocated)
julia> @time mean(xs);
elapsed time: 0.013815785 seconds (96 bytes allocated)
julia> @time mean(xv);
elapsed time: 0.015871052 seconds (96 bytes allocated)
There are several reasons why sub(x, inds)
is slower than sub(x, 1:N)
:
xs[i]
corresponds to x[inds[i]]
; we have to look up two memory locations rather than oneinds
is not in order, you will get poor cache behavior in accessing the elements of x
In this case, the latter is probably the most important effect. This is not a Julia limitation; the same thing would happen were you to write the equivalent code in C, Fortran, or assembly.
Note that it's still faster to say sum(sub(x, inds))
than sum(x[inds])
, (until the latter becomes the former, which it should by the time julia 0.4 is officially out). But if you have to do many operations with xs = sub(x, inds)
, in some circumstances it will be worth your while to make a copy, even though it allocates memory, just so you can take advantage of the optimizations possible when values are stored in contiguous memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With