Going thought the Julia's performance tips I haven't found any suggestions regarding how to speed up a code with three dimensional arrays.
From my understanding d-element Array{Array{Float64,2},1} would perform best when d (the third dimension) is small. However, I am not sure whether this is the case when d is large.
Is there any tutorial on this topic for Julia?
Example 1a (d=50)
x = [zeros(100, 10) for d=1:50];
@time for d=1:50
x[d] = rand(100,10);
end
0.000100 seconds (50 allocations: 396.875 KB)
Example 1b (d=50)
y=zeros(100, 10, 50);
@time for d=1:50
y[:,:,d] = rand(100,10);
end
0.000257 seconds (200 allocations: 400.781 KB)
Example 2a (d=50000)
x = [zeros(100, 10) for d=1:50000];
@time for d=1:50000
x[d] = rand(100,10);
end
0.410813 seconds (99.49 k allocations: 388.328 MB, 81.88% gc time)
Example 2b (d=50000)
y=zeros(100, 10, 50000);
@time for d=1:50000
y[:,:,d] = rand(100,10);
end
0.185929 seconds (298.98 k allocations: 392.898 MB, 6.83% gc time)
From my understanding d-element Array{Array{Float64,2},1} would perform best when d (the third dimension) is small. However, I am not sure whether this is the case when d is large.
No, it's moreso how you use it. A = Array{Array{Float64,2},1} is an array of pointers to matrices. The value of an array is the pointer or the reference. Thus A[i] returns a reference, i.e. it's cheap. A2 = Array{Float64,3} is a contiguous array of floats. It's really just an indexing setup over a linear slab of memory (and has a linear index A2[i] which runs through the whole thing using that linear form).
The latter has some advantages because it is contiguous. There's no indirection, so looping over all of A2s values will be faster. A has to deference two pointers to get a value, so a simple 3D loop will be slower if you don't know to deference each internal matrix only once. Also, you can get views to the matrices via @view A2[:,:,1] etc., but you have to take note that A2[:,:,1] by itself will make a copy of the matrix. A[1] is natural a view because it returns the reference to the matirx, and if you want to copy you'd have to explicitly do copy(A[1]). Because A is just a linear array of pointers, push!ing a new matrix onto it is cheap since it's just increasing a relatively small array (and push! is automatically amortized) to add a new pointer on the end (this is why things like DifferentialEqautions.jl use arrays of arrays to build timeseries instead of the more traditional matrix).
So they are different tools with different advantages and disadvantages.
As for your timings, you're doing two different things. x[d] = rand(100,10) is creating a new matrix and adding its reference to x. y[:,:,d] = rand(100,10) is creating a new matrix and looping through the values of y to change the values of y. You can see why that's slower. But what you're leaving out is the allocation-free cases.
function f2()
y=zeros(100, 10, 50);
@time for i in eachindex(y)
y[i] = rand()
end
y
end
In the small case this matches the array creation. You can't naively do this on case one, but as I said, if you dereference the pointer for the matrix once you do really well:
function f()
x = [zeros(100, 10) for d=1:5000];
@time @inbounds for d=1:50
xd = x[d]
for i in eachindex(xd)
xd[i] = rand()
end
end
x
end
So arrays of arrays can be great data structures in the right cases. The library RecursiveArrayTools.jl was created to take better advantage of it. For example, A3 = VectorOfArrays(A) gives A3 the same indexing structure as A2 by lazily transforming A[i,j,k] to A[k][i,j]. However, it keeps the advantages of A, but will automatically make sure to broadcast in the correct way like f. Another tool like this is the ArrayPartition which allows for heterogeneous typing in a broadcast-performant way.
So yeah, it's not always the right tool, but these heterogeneous and recursive arrays are great tools when used correctly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With