Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is numpy list access slower than vanilla python?

I was under the impression that numpy would be faster for list operations, but the following example seems to indicate otherwise:

import numpy as np
import time

def ver1():
    a = [i for i in range(40)]
    b = [0 for i in range(40)]
    for i in range(1000000):
        for j in range(40):
            b[j]=a[j]

def ver2():
    a = np.array([i for i in range(40)])
    b = np.array([0 for i in range(40)])
    for i in range(1000000):
        for j in range(40):
            b[j]=a[j]

t0 = time.time()
ver1()
t1 = time.time()
ver2()
t2 = time.time()

print(t1-t0)
print(t2-t1)

Output is:

4.872278928756714
9.120521068572998

(I'm running 64-bit Python 3.4.3 in Windows 7, on an i7 920)

I do understand that this isn't the fastest way to copy a list, but I'm trying to find out if I'm using numpy incorrectly. Or is it the case that numpy is slower for this kind of operation and is only more efficient in more complex operations?

EDIT:

I also tried the following, which just just does a direct copy via b[:] = a, and numpy is still twice as slow:

import numpy as np
import time

def ver6():
    a = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
    b = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
    for i in range(1000000):
        b[:] = a

def ver7():
    a = np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
    b = np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
    for i in range(1000000):
        b[:] = a

t0 = time.time()
ver6()
t1 = time.time()
ver7()
t2 = time.time()

print(t1-t0)
print(t2-t1)

Output is:

0.36202096939086914
0.6750380992889404
like image 754
CaptainCodeman Avatar asked Jan 26 '16 17:01

CaptainCodeman


People also ask

Why is NumPy array faster than Python list?

As the array size increase, Numpy gets around 30 times faster than Python List. Because the Numpy array is densely packed in memory due to its homogeneous type, it also frees the memory faster.

Which is faster NumPy or list?

As predicted, we can see that NumPy arrays are significantly faster than lists.

How do I speed up NumPy in Python?

By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.

Is Python NumPy better than lists?

The answer is performance. Numpy data structures perform better in: Size - Numpy data structures take up less space. Performance - they have a need for speed and are faster than lists.


Video Answer


2 Answers

You're using NumPy wrong. NumPy's efficiency relies on doing as much work as possible in C-level loops instead of interpreted code. When you do

for j in range(40):
    b[j]=a[j]

That's an interpreted loop, with all the intrinsic interpreter overhead and more, because NumPy's indexing logic is way more complex than list indexing, and NumPy needs to create a new element wrapper object on every element retrieval. You're not getting any of the benefits of NumPy when you write code like this.

You need to write the code in such a way that the work happens in C:

b[:] = a

This would also improve the efficiency of the list operation, but it's much more important for NumPy.

like image 136
user2357112 supports Monica Avatar answered Oct 02 '22 05:10

user2357112 supports Monica


Most of what you are seeing is Python object creation from C native types.

A Python list is, at it's heart, an array of PyObject pointers. When a and b are both Python lists, doing b[i] = a[i] will imply:

  • decreasing the reference count of the object pointed by b[i],
  • increasing the reference count of the object pointed by a[i], and
  • copying the address stored in a[i] into b[i].

But if a and b are NumPy arrays, things are a little more ellaborate, and the same b[i] = a[i] then requires:

  • creating a Python integer object from the native C integer type stored at a[i], see this,
  • converting the Python integer object into a native C integer type, and storing its value in b[i], see here, and
  • decreasing the reference count of the temporary Python integer object.

So the difference is mostly in creating and disposing of that intermediate Python object, that lists do not need to do.

like image 37
Jaime Avatar answered Oct 02 '22 06:10

Jaime