Numpy individual element access slower than for lists

Tags:

I just started using Numpy and noticed that iterating through each element in a Numpy array is ~4x slower than doing the same but with a list of lists. I know now that this defeats the purpose of Numpy and I should vectorize the function if possible. My question is though why is it 4x slower. That seems like quite a large amount.

I ran the tests below using %timeit

import numpy as np b = np.eye(1000) a = b.tolist()  %timeit b[100][100] #1000000 loops, best of 3: 692 ns per loop %timeit a[100][100] #10000000 loops, best of 3: 70.7 ns per loop %timeit b[100,100] #1000000 loops, best of 3: 343 ns per loop %timeit b.item(100,100) #1000000 loops, best of 3: 297 ns per loop

I tried to use dis.dis to see what was going on under the hood but got:

TypeError: don't know how to disassemble method-wrapper objects

Then I tried to look at the Numpy source code but couldn't figure out which file corresponded to array element access. I'm curious what accounts for the extra overhead, and more importantly how to figure this out for myself in the future. It seems like python can't be easily compiled to C code so that I can see the difference. But is there a way to see what byte code is generated for each line, to get a sense of the differences?

635

asked Mar 26 '15 14:03

emschorsch

1 Answers

In summary: getting an item from a NumPy array requires new Python objects to be created, whereas this is not the case for lists. Also, indexing is more slightly more complicated for NumPy arrays than lists which may add some additional overhead.

To recap, the NumPy operations you have listed do the following:

b[100][100] returns row 100 of b as an array, and then gets the value at index 100 of this row, returning the value as an object (e.g. a np.int64 type).
b[100,100] returns the value at row 100 and column 100 directly (no intermediate array is returned first).
b.item(100,100) does the same as above b[100,100] except that the value is converted to a native Python type and returned.

Now of these operation, (1) is slowest because it requires two sequential NumPy indexing operations (I'll explain why this is slower than list indexing below). (2) is quickest because only a single indexing operation is performed. Operation (3) is possibly slower as it is a method call (these are generally slow in Python).

Why is list access still faster than b[100,100]?

Object creation

Python lists are arrays of pointers to objects in memory. For example, the list [1, 2, 3] does not contain those integers directly, but rather pointers to the memory addresses were each integer object already exists. To get an item from the list, Python just returns a reference to the object.

NumPy arrays are not collections of objects. The array np.array([1, 2, 3]) is just a contiguous block of memory with bits set to represent those integer values. To get an integer from this array, a new Python object must be constructed in memory separate to the array. For instance, an object of np.int64 may be returned by the indexing operation: this object did not exist previously and had to be created.

Indexing complexity

Two other reasons why a[100][100] (getting from the list) is quicker than b[100,100] (getting from the array) are that:

The bytecode opcode BINARY_SUBSCR is executed when indexing both lists and arrays, but it is optimised for the case of Python lists.
The internal C function handling integer indexing for Python lists is very short and simple. On the other hand, NumPy indexing is much more complicated and a significant amount of code is executed to determine the type of indexing being used so that the correct value can be returned.

Below, the steps for accessing elements in a list and array with a[100][100] and b[100,100] are described in more detail.

Bytecode

The same four bytecode opcodes are triggered for both lists and arrays:

  0 LOAD_NAME                0 (a)           # the list or array   3 LOAD_CONST               0 (100)         # index number (tuple for b[100,100])   6 BINARY_SUBSCR                            # find correct "getitem" function   7 RETURN_VALUE                             # value returned from list or array

Note: if you start chain indexing for multi-dimensional lists, e.g. a[100][100][100], you start to repeat these bytecode instructions. This does not happen for NumPy arrays using the tuple indexing: b[100,100,100] uses just the four instructions. This is why the gap in the timings begins to close as the number of dimensions increases.

Finding the correct "getitem" function

The functions for accessing lists and arrays are different and the correct one needs to be found in each case. This task is handled by the BINARY_SUBSCR opcode:

w = POP();                                            // our index v = TOP();                                            // our list or NumPy array if (PyList_CheckExact(v) && PyInt_CheckExact(w)) {    // do we have a list and an int?     /* INLINE: list[int] */     Py_ssize_t i = PyInt_AsSsize_t(w);         if (i < 0)              i += PyList_GET_SIZE(v);         if (i >= 0 && i < PyList_GET_SIZE(v)) {              x = PyList_GET_ITEM(v, i);               // call "getitem" for lists              Py_INCREF(x);         }         else             goto slow_get;      }      else        slow_get:          x = PyObject_GetItem(v, w);                  // else, call another function                                                       // to work out what is needed      Py_DECREF(v);      Py_DECREF(w);      SET_TOP(x);      if (x != NULL) continue;      break;

This code is optimised for Python lists. If the function sees a list, it will quickly call the function PyList_GET_ITEM. This list can now be accessed at the required index (see next section below).

However, if it doesn't see a list (e.g. we have a NumPy array), it takes the "slow_get" path. This in turn calls another function PyObject_GetItem to check which "getitem" function the object is mapped to:

PyObject_GetItem(PyObject *o, PyObject *key) {     PyMappingMethods *m;      if (o == NULL || key == NULL)         return null_error();      m = o->ob_type->tp_as_mapping;     if (m && m->mp_subscript)         return m->mp_subscript(o, key);     ...

In the case of NumPy arrays, the correct function is located in mp_subscript in the PyMappingMethods structure.

Notice the additional function calls before this correct "get" function can be called. These calls add to the overhead for b[100], although how much will depend on how Python/NumPy was compiled, the system architecture, and so on.

Getting from a Python list

Above it was seen that the function PyList_GET_ITEM is called. This is a short function that essentially looks like this*:

PyList_GetItem(PyObject *op, Py_ssize_t i) {     if (!PyList_Check(op)) {                            // check if list         PyErr_BadInternalCall();         return NULL;     }     if (i < 0 || i >= Py_SIZE(op)) {                    // check i is in range         if (indexerr == NULL) {             indexerr = PyUnicode_FromString(                 "list index out of range");             if (indexerr == NULL)                 return NULL;         }         PyErr_SetObject(PyExc_IndexError, indexerr);         return NULL;     }     return ((PyListObject *)op) -> ob_item[i];           // return reference to object }

* PyList_GET_ITEM is actually the macro form of this function which does the same thing, minus error checking.

This means that getting the item at index i of a Python list is relatively simple. Internally, Python checks whether the type of the item being is a list, whether i is in the correct range for the list, and then returns the reference to the object in the list.

Getting from a NumPy array

In contrast, NumPy has to do much more work before the value at the requested index can be returned.

Arrays can be indexed in a variety of different ways and NumPy has to decide which index routine is needed. The various indexing routines are handled largely by code in mapping.c.

Anything used to index NumPy arrays passes through the function prepare_index which begins the parsing of the index and stores the information about broadcasting, number of dimensions, and so on. Here is the call signature for the function:

NPY_NO_EXPORT int prepare_index(PyArrayObject *self, PyObject *index,               npy_index_info *indices,               int *num, int *ndim, int *out_fancy_ndim, int allow_boolean)   /* @param the array being indexed   * @param the index object   * @param index info struct being filled (size of NPY_MAXDIMS * 2 + 1)   * @param number of indices found   * @param dimension of the indexing result   * @param dimension of the fancy/advanced indices part   * @param whether to allow the boolean special case    */

The function has to do a lot of checks. Even for a relatively simple index such as b[100,100], a lot of information has to be inferred so that NumPy can return a reference (view) to the correct value.

In conclusion, it takes longer for the "getitem" function for NumPy to be found and the functions handling the indexing of arrays are necessarily more complex than the single function for Python lists.

175

answered Sep 23 '22 21:09

Alex Riley

Related questions
                            
                                How do you read Tensorboard files programmatically?
                            
                                Python simple socket client/server using asyncio
                            
                                Failed to read descriptor from node connection: A device attached to the system is not functioning error using ChromeDriver Selenium on Windows OS
                            
                                What Python framework for a REST/JSON web service with no front end?
                            
                                python lxml - modify attributes
                            
                                How to clean the database, dropping all records using sqlalchemy?
                            
                                How to read a file in other directory in python
                            
                                change first line of a file in python
                            
                                Patch - Why won't the relative patch target name work?
                            
                                keras vs. tensorflow.python.keras - which one to use?
                            
                                Guide in organizing large Django projects [closed]
                            
                                Difference between yield in Python and yield in C#
                            
                                How to load a C# dll in python?
                            
                                Colorbar for matplotlib plot_surface command
                            
                                Python overriding getter without setter
                            
                                Scipy curvefit RuntimeError:Optimal parameters not found: Number of calls to function has reached maxfev = 1000
                            
                                Join multiple tables in SQLAlchemy/Flask
                            
                                How can I serve NPM packages using Flask?
                            
                                How to plot a 3D density map in python with matplotlib
                            
                                Replace sub part of matrix by another small matrix in numpy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Numpy individual element access slower than for lists

Tags:

python

arrays

list

numpy