On this page it says that methods <code>push!()</code> and <code>append!()</code> are very efficient. My question is how exactly efficient they are? Namely, If one knows the size of the final array, is it still faster to preallocate the array or growing it incrementally using <code>append!()</code> / <code>push!()</code> would be just as efficient? Now consider the case when one does not know the size of the final array. For example, merging multiple arrays into 1 big array (call it <code>A</code>). Two ways to achieve this: <ol> <li> <code>append!()</code>-ing each array to <code>A</code>, whose size has not been preallocated.</li> <li>First sum dimensions of each array to find the final size of the merged array <code>A</code>. Then preallocate <code>A</code> and copy over contents of each array.</li> </ol> Which one would be more efficient in this case?

The answer to a question like this is usually: "it depends". For example, what size array are you trying to make? What is the element-type of the array? But if you're just after a heuristic, why not run a simple speed test? For example, the following snippet: <pre class="prettyprint"><code>function f1(N::Int) x = Array(Int, N) for n = 1:N x[n] = n end return(x) end function f2(N::Int) x = Array(Int, 0) for n = 1:N push!(x, n) end return(x) end f1(2) f2(2) N = 5000000000 @time f1(N) @time f2(N) </code></pre> suggests that using <code>push!</code> is about 6 times slower than pre-allocating. If you were using <code>append!</code> to add larger blocks with less steps, the multiplier would almost certainly be less. When interpreting these numbers, resist the knee-jerk reaction of "What!? 6-times slower!?". This number needs to be placed in the context of how important array building is to your entire program/function/subroutine. For example, if array building comprises only 1% of the run-time of your routine (for most typical routines, array building would comprise much less than 1%), then if your routine runs for 100 seconds, 1 second is spent building arrays. Multiply that by 6 to get 6 seconds. 99 seconds + 6 seconds = 105 seconds. So, using <code>push!</code> instead of pre-allocating increases the runtime of your whole program by 5%. Unless you work in high-frequency trading, you're probably not going to care about that. For myself, my usual rule is this: if I can pre-allocate easily, then I pre-allocate. But if <code>push!</code> makes the routine much easier to code, with lower possibility of introducing bugs, and less messing around trying to pre-determine the appropriate array size, then I use <code>push!</code> without a second thought. Final note: if you want to actually look at the specifics of how <code>push!</code> works, you'll need to delve into the C routines, since the julia source just wraps a <code>ccall</code>. UPDATE: OP questioned in the comments the difference between <code>push!</code> and an operation like <code>array(end+1) = n</code> in MATLAB. I haven't coded in MATLAB recently, but I do keep a copy on my machine since the code for all my older papers is in MATLAB. My current version is R2014a. My understanding is that in this version of MATLAB, adding to the end of the array will re-allocate the entire array. In contrast, <code>push!</code> in Julia works, to the best of my knowledge, much like lists in <code>.NET</code>. The memory allocated to the vector is dynamically added in blocks as the size of the vector grows. This massively reduces the amount of re-allocation that needs to be performed, although my understanding is that some re-allocation is still necessary (I'm happy to be corrected on this point). So <code>push!</code> should work much faster than adding to an array in Matlab. So we can run the following MATLAB code: <pre class="prettyprint"><code>N = 10000000; tic x = ones(N, 1); for n = 1:N x(n) = n; end toc N = 10000000; tic x = []; for n = 1:N x(end+1) = n; end toc </code></pre> I get: <pre class="prettyprint"><code>Elapsed time is 0.407288 seconds. Elapsed time is 1.802845 seconds. </code></pre> So, about a 5-times slowdown. Given the extreme non-rigour applied in the timing methodology, one might be tempted to say this is equivalent to the Julia case. But wait, if we re-run the exercise in Julia with <code>N = 10000000</code>, the timings are 0.01 and 0.07 seconds. The sheer difference in the magnitude of these numbers to the MATLAB numbers makes me very nervous about making claims about what is actually happening under the hood, and whether it is legitimate to compare the 5-times slowdown in MATLAB to the 6-times slowdown in Julia. Basically, I'm now out of my depth. Maybe someone who knows more about what MATLAB actually does under the hood can offer more insight. Regarding Julia, I'm not much of a C-coder, so I doubt I'll get much insight from looking through the source (which is publicly available, unlike MATLAB).

How efficient are push!() and append!() methods in Julia?

2 Answers

The answer to a question like this is usually: "it depends". For example, what size array are you trying to make? What is the element-type of the array?

But if you're just after a heuristic, why not run a simple speed test? For example, the following snippet:

function f1(N::Int)
    x = Array(Int, N)
    for n = 1:N
        x[n] = n
    end
    return(x)
end

function f2(N::Int)
    x = Array(Int, 0)
    for n = 1:N
        push!(x, n)
    end
    return(x)
end

f1(2)
f2(2)

N = 5000000000
@time f1(N)
@time f2(N)

suggests that using push! is about 6 times slower than pre-allocating. If you were using append! to add larger blocks with less steps, the multiplier would almost certainly be less.

When interpreting these numbers, resist the knee-jerk reaction of "What!? 6-times slower!?". This number needs to be placed in the context of how important array building is to your entire program/function/subroutine. For example, if array building comprises only 1% of the run-time of your routine (for most typical routines, array building would comprise much less than 1%), then if your routine runs for 100 seconds, 1 second is spent building arrays. Multiply that by 6 to get 6 seconds. 99 seconds + 6 seconds = 105 seconds. So, using push! instead of pre-allocating increases the runtime of your whole program by 5%. Unless you work in high-frequency trading, you're probably not going to care about that.

For myself, my usual rule is this: if I can pre-allocate easily, then I pre-allocate. But if push! makes the routine much easier to code, with lower possibility of introducing bugs, and less messing around trying to pre-determine the appropriate array size, then I use push! without a second thought.

Final note: if you want to actually look at the specifics of how push! works, you'll need to delve into the C routines, since the julia source just wraps a ccall.

UPDATE: OP questioned in the comments the difference between push! and an operation like array(end+1) = n in MATLAB. I haven't coded in MATLAB recently, but I do keep a copy on my machine since the code for all my older papers is in MATLAB. My current version is R2014a. My understanding is that in this version of MATLAB, adding to the end of the array will re-allocate the entire array. In contrast, push! in Julia works, to the best of my knowledge, much like lists in .NET. The memory allocated to the vector is dynamically added in blocks as the size of the vector grows. This massively reduces the amount of re-allocation that needs to be performed, although my understanding is that some re-allocation is still necessary (I'm happy to be corrected on this point). So push! should work much faster than adding to an array in Matlab. So we can run the following MATLAB code:

N = 10000000;
tic
x = ones(N, 1);
for n = 1:N
    x(n) = n;
end
toc


N = 10000000;
tic
x = [];
for n = 1:N
    x(end+1) = n;
end
toc

I get:

Elapsed time is 0.407288 seconds.
Elapsed time is 1.802845 seconds.

So, about a 5-times slowdown. Given the extreme non-rigour applied in the timing methodology, one might be tempted to say this is equivalent to the Julia case. But wait, if we re-run the exercise in Julia with N = 10000000, the timings are 0.01 and 0.07 seconds. The sheer difference in the magnitude of these numbers to the MATLAB numbers makes me very nervous about making claims about what is actually happening under the hood, and whether it is legitimate to compare the 5-times slowdown in MATLAB to the 6-times slowdown in Julia. Basically, I'm now out of my depth. Maybe someone who knows more about what MATLAB actually does under the hood can offer more insight. Regarding Julia, I'm not much of a C-coder, so I doubt I'll get much insight from looking through the source (which is publicly available, unlike MATLAB).

answered Oct 25 '22 14:10

Colin T Bowers

push! will always be slower than inserting into a pre-allocated array, if for no other reason than push! (1) inserts the element, just as when you do so manually, and (2) increments the length of the array. Two operations cannot be faster than one, when one is part of two.

However, as noted in other answers the gap is often not so large as to something to be concerned about. Internally (last time I checked the code), Julia uses a grow-by-a-factor-of-2 strategy, so that you only need log2(N) reallocations.

If you do know the size of the array in advance, you can eliminate reallocations by using sizehint!. As you can easily test for yourself, this does not eliminate the performance penalty relative to insertion into a preallocated array, but it can reduce it.

answered Oct 25 '22 15:10

tholy

Related questions
                            
                                Getting the object array index in jq
                            
                                Initialization from incompatible pointer type warning when assigning to a pointer
                            
                                spread operator converting objects to array
                            
                                postgres array_agg ERROR: cannot accumulate arrays of different dimensionality
                            
                                How to compare two arrays and remove matching elements from one for the next loop?
                            
                                Does array changes in method?
                            
                                PHP JSON or Array to XML
                            
                                Array elementwise operations
                            
                                Can std::begin work with array parameters and if so, how?
                            
                                numpy array concatenation error: 0-d arrays can't be concatenated
                            
                                Add n zeros to the end of an array
                            
                                Find all the numbers in one file that are not in another file in python
                            
                                generic method to print all elements in an array
                            
                                sorting members of structure array
                            
                                How to check if a string is in an array of strings in C?
                            
                                PHP string console parameters to array
                            
                                Java HashMap associative multi dimensional array can not create or add elements
                            
                                Python - Conversion of list of arrays to 2D array
                            
                                How do I initialize a variable size array in a C++ class?
                            
                                Sort array on two parameters in swift

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How efficient are push!() and append!() methods in Julia?

Tags:

arrays

julia

aberdysh

People also ask

2 Answers

Colin T Bowers

tholy

Recent Activity

Donate For Us