I was under the impression that numpy would be faster for list operations, but the following example seems to indicate otherwise:
import numpy as np
import time
def ver1():
a = [i for i in range(40)]
b = [0 for i in range(40)]
for i in range(1000000):
for j in range(40):
b[j]=a[j]
def ver2():
a = np.array([i for i in range(40)])
b = np.array([0 for i in range(40)])
for i in range(1000000):
for j in range(40):
b[j]=a[j]
t0 = time.time()
ver1()
t1 = time.time()
ver2()
t2 = time.time()
print(t1-t0)
print(t2-t1)
Output is:
4.872278928756714
9.120521068572998
(I'm running 64-bit Python 3.4.3 in Windows 7, on an i7 920)
I do understand that this isn't the fastest way to copy a list, but I'm trying to find out if I'm using numpy incorrectly. Or is it the case that numpy is slower for this kind of operation and is only more efficient in more complex operations?
EDIT:
I also tried the following, which just just does a direct copy via b[:] = a, and numpy is still twice as slow:
import numpy as np
import time
def ver6():
a = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
b = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
for i in range(1000000):
b[:] = a
def ver7():
a = np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
b = np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
for i in range(1000000):
b[:] = a
t0 = time.time()
ver6()
t1 = time.time()
ver7()
t2 = time.time()
print(t1-t0)
print(t2-t1)
Output is:
0.36202096939086914
0.6750380992889404
As the array size increase, Numpy gets around 30 times faster than Python List. Because the Numpy array is densely packed in memory due to its homogeneous type, it also frees the memory faster.
As predicted, we can see that NumPy arrays are significantly faster than lists.
By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.
The answer is performance. Numpy data structures perform better in: Size - Numpy data structures take up less space. Performance - they have a need for speed and are faster than lists.
You're using NumPy wrong. NumPy's efficiency relies on doing as much work as possible in C-level loops instead of interpreted code. When you do
for j in range(40):
b[j]=a[j]
That's an interpreted loop, with all the intrinsic interpreter overhead and more, because NumPy's indexing logic is way more complex than list indexing, and NumPy needs to create a new element wrapper object on every element retrieval. You're not getting any of the benefits of NumPy when you write code like this.
You need to write the code in such a way that the work happens in C:
b[:] = a
This would also improve the efficiency of the list operation, but it's much more important for NumPy.
Most of what you are seeing is Python object creation from C native types.
A Python list is, at it's heart, an array of PyObject
pointers. When a
and b
are both Python lists, doing b[i] = a[i]
will imply:
b[i]
,a[i]
, anda[i]
into b[i]
.But if a
and b
are NumPy arrays, things are a little more ellaborate, and the same b[i] = a[i]
then requires:
a[i]
, see this,b[i]
, see here, andSo the difference is mostly in creating and disposing of that intermediate Python object, that lists do not need to do.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With