Why is numpy list access slower than vanilla python?

Tags:

I was under the impression that numpy would be faster for list operations, but the following example seems to indicate otherwise:

import numpy as np
import time

def ver1():
    a = [i for i in range(40)]
    b = [0 for i in range(40)]
    for i in range(1000000):
        for j in range(40):
            b[j]=a[j]

def ver2():
    a = np.array([i for i in range(40)])
    b = np.array([0 for i in range(40)])
    for i in range(1000000):
        for j in range(40):
            b[j]=a[j]

t0 = time.time()
ver1()
t1 = time.time()
ver2()
t2 = time.time()

print(t1-t0)
print(t2-t1)

Output is:

4.872278928756714
9.120521068572998

(I'm running 64-bit Python 3.4.3 in Windows 7, on an i7 920)

I do understand that this isn't the fastest way to copy a list, but I'm trying to find out if I'm using numpy incorrectly. Or is it the case that numpy is slower for this kind of operation and is only more efficient in more complex operations?

EDIT:

I also tried the following, which just just does a direct copy via b[:] = a, and numpy is still twice as slow:

import numpy as np
import time

def ver6():
    a = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
    b = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
    for i in range(1000000):
        b[:] = a

def ver7():
    a = np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
    b = np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
    for i in range(1000000):
        b[:] = a

t0 = time.time()
ver6()
t1 = time.time()
ver7()
t2 = time.time()

print(t1-t0)
print(t2-t1)

Output is:

0.36202096939086914
0.6750380992889404

754

asked Jan 26 '16 17:01

CaptainCodeman

Video Answer

2 Answers

You're using NumPy wrong. NumPy's efficiency relies on doing as much work as possible in C-level loops instead of interpreted code. When you do

for j in range(40):
    b[j]=a[j]

That's an interpreted loop, with all the intrinsic interpreter overhead and more, because NumPy's indexing logic is way more complex than list indexing, and NumPy needs to create a new element wrapper object on every element retrieval. You're not getting any of the benefits of NumPy when you write code like this.

You need to write the code in such a way that the work happens in C:

b[:] = a

This would also improve the efficiency of the list operation, but it's much more important for NumPy.

136

answered Oct 02 '22 05:10

user2357112 supports Monica

Most of what you are seeing is Python object creation from C native types.

A Python list is, at it's heart, an array of PyObject pointers. When a and b are both Python lists, doing b[i] = a[i] will imply:

decreasing the reference count of the object pointed by b[i],
increasing the reference count of the object pointed by a[i], and
copying the address stored in a[i] into b[i].

But if a and b are NumPy arrays, things are a little more ellaborate, and the same b[i] = a[i] then requires:

creating a Python integer object from the native C integer type stored at a[i], see this,
converting the Python integer object into a native C integer type, and storing its value in b[i], see here, and
decreasing the reference count of the temporary Python integer object.

So the difference is mostly in creating and disposing of that intermediate Python object, that lists do not need to do.

answered Oct 02 '22 06:10

Jaime

Related questions
                            
                                Python regex find and replace inplace
                            
                                Flask-admin inline modelling passing form arguments throws AttributeError
                            
                                Add additional feature to CountVectorizer matrix
                            
                                Trying to Plot OpenCV's MSER regions using matplotlib
                            
                                Higher order functions in Python
                            
                                Importing time module twice
                            
                                Pandas dataframe: how to group by values in a column and create new columns out of grouped values
                            
                                Performing grouped average and standard deviation with NumPy arrays
                            
                                A better way to aggregate data and keep table structure and column names with Pandas
                            
                                How do I define custom magics in jupyter?
                            
                                Remote connection to MS SQL - Error using pyodbc vs success using SQL Server Management Studio
                            
                                Error on appending to SQLAlchemy List
                            
                                How do I suppress the IPython startup message?
                            
                                Where can I find more information about new syntax supported in Google style docstrings with the napoleon extension of sphinx-doc?
                            
                                Pygame. How do I resize a surface and keep all objects within proportionate to the new window size?
                            
                                Type error: unhashable type 'list' while selecting subset from specific columns pandas dataframe
                            
                                generate multiple lists with one function
                            
                                merge and sum two dataframes where columns match python pandas
                            
                                Long to wide data. Pandas
                            
                                re.split with spaces in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is numpy list access slower than vanilla python?

Tags:

performance

python

numpy