Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimal way to append to numpy array

I have a numpy array and I can simply append an item to it using append, like this:

numpy.append(myarray, 1)

In this case I just appended the integer 1.

But is this the quickest way to append to an array? I have a very long array that runs into the tens of thousands.

Or is it better to index the array and assign it directly? Like this:

myarray[123] = 1
like image 224
user961627 Avatar asked Sep 03 '14 16:09

user961627


Video Answer


2 Answers

Appending to numpy arrays is very inefficient. This is because the interpreter needs to find and assign memory for the entire array at every single step. Depending on the application, there are much better strategies.

If you know the length in advance, it is best to pre-allocate the array using a function like np.ones, np.zeros, or np.empty.

desired_length = 500
results = np.empty(desired_length)
for i in range(desired_length):
    results[i] = i**2

If you don't know the length, it's probably more efficient to keep your results in a regular list and convert it to an array afterwards.

results = []
while condition:
    a = do_stuff()
    results.append(a)
results = np.array(results)

Here are some timings on my computer.

def pre_allocate():
    results = np.empty(5000)
    for i in range(5000):
        results[i] = i**2
    return results

def list_append():
    results = []
    for i in range(5000):
        results.append(i**2)
    return np.array(results)

def numpy_append():
    results = np.array([])
    for i in range(5000):
        np.append(results, i**2)
    return results

%timeit pre_allocate()
# 100 loops, best of 3: 2.42 ms per loop

%timeit list_append()
# 100 loops, best of 3: 2.5 ms per loop

%timeit numpy_append()
# 10 loops, best of 3: 48.4 ms per loop

So you can see that both pre-allocating and using a list then converting are much faster.

like image 133
Roger Fan Avatar answered Oct 03 '22 00:10

Roger Fan


If you know the size of the array at the end of the run, then it is going to be much faster to pre-allocate an array of the appropriate size and then set the values. If you do need to append on-the fly, it's probably better to try to not do this one element at a time, instead appending as few times as possible to avoid generating many copies over and over again. You might also want to do some profiling of the difference in timings of np.append, np.hstack, np.concatenate. etc.

like image 33
JoshAdel Avatar answered Oct 03 '22 01:10

JoshAdel