Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use numpy over list based on speed?

With reference to Why NumPy instead of Python lists?

tom10 said :

Speed: Here's a test on doing a sum over a list and a NumPy array, showing that the sum on the NumPy array is 10x faster (in this test -- mileage may vary).

But my test using the following code:

import numpy as np
import time as time

N = 100000

#using numpy
start = time.time()
array = np.array([])

for i in range(N):
    array = np.append(array, i)

end = time.time()
print ("Using numpy: ", round(end-start, 2), end="\n")

#using list
start = time.time()
list = []

for i in range(N):
    list.append(i)

list = np.array(list)   
end = time.time()
print ("Using list : ", round(end-start, 2), end="\n")

Give the result:

Using numpy: 8.35
Using list : 0.02

It is true that when using "append", list is better than numpy ?

like image 887
Pak Long Avatar asked Oct 21 '17 07:10

Pak Long


People also ask

What is the advantage of NumPy over list?

NumPy uses much less memory to store data The NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.

Why are NumPy arrays used over lists?

NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further.

How much faster is NumPy than lists?

NumPy aims to provide an array object that is up to 50x faster than traditional Python lists. The array object in NumPy is called ndarray, it provides a lot of supporting functions that make working with ndarray very easy. Arrays are very frequently used in data science, where speed and resources are very important.

Is it faster to append to NumPy array or list?

It's faster to append list first and convert to array than appending NumPy arrays.


1 Answers

To answer your question, yes. Appending to an array is an expensive operation, while lists make it relatively cheap (see Internals of Python list, access and resizing runtimes for why). However, that's no reason to abandon numpy. There are other ways to easily add data to a numpy array.

There are surprising (to me, anyway) amount of ways to do this. Skip to the bottom to see benchmarks for each of them.

Probably the most common is to simply pre-allocate the array, and index into that,

#using preallocated numpy
start = time.time()
array = np.zeros(N)

for i in range(N):
    array[i] = i

end = time.time()
print ("Using preallocated numpy: ", round(end-start, 5), end="\n")

Of course, you can preallocate the memory for a list too, so lets include that for a benchmark comparison.

#using preallocated list
start = time.time()
res = [None]*N

for i in range(N):
    res[i] = i

res = np.array(res)
end = time.time()
print ("Using preallocated list : ", round(end-start, 5), end="\n")

Depending on your code, it may also be helpful to use numpy's fromiter function, which uses the results of an iterator to initialize the array.

#using numpy fromiter shortcut
start = time.time()

res = np.fromiter(range(N), dtype='float64') # Use same dtype as other tests

end = time.time()
print ("Using fromiter : ", round(end-start, 5), end="\n")

Of course, using a built in iterator isn't terribly flexible so let's try a custom iterator as well,

#using custom iterator
start = time.time()
def it(N):
    i = 0
    while i < N:
        yield i
        i += 1

res = np.fromiter(it(N), dtype='float64') # Use same dtype as other tests

end = time.time()
print ("Using custom iterator : ", round(end-start, 5), end="\n")

That's two very flexible ways of using numpy. The first, using a preallocated array, is the most flexible. Let's see how they compare:

Using numpy:  2.40017
Using list :  0.0164
Using preallocated numpy:  0.01604
Using preallocated list :  0.01322
Using fromiter :  0.00577
Using custom iterator :  0.01458

Right off, you can see that preallocating makes numpy much faster than using lists, although preallocating the list brings both to about the same speed. Using a builtin iterator is extremely fast, although the iterator performance drops back into the range of the preallocated array and list when a custom iterator is used.

Converting code directly to numpy often has poor performance, as with append. Finding an approach using numpy's methods can almost always give a substantial improvement. In this case, preallocating the array or expressing the calculation of each element as an iterator to get similar performance to python lists. Or use a vanilla python list since the performance is about the same.

EDITS: Original answer also included np.fromfunction. This was removed since it didn't fit the use case of adding one element at a time, fromfunction actually initializes the array and uses numpy's broadcasting to make a single function call. It is about a hundred times faster, so if you can solve your problem using broadcasting don't bother with these other methods.

like image 94
user2699 Avatar answered Nov 03 '22 00:11

user2699