I just started using Numpy and noticed that iterating through each element in a Numpy array is ~4x slower than doing the same but with a list of lists. I know now that this defeats the purpose of Numpy and I should vectorize the function if possible. My question is though why is it 4x slower. That seems like quite a large amount.
I ran the tests below using %timeit
import numpy as np b = np.eye(1000) a = b.tolist() %timeit b[100][100] #1000000 loops, best of 3: 692 ns per loop %timeit a[100][100] #10000000 loops, best of 3: 70.7 ns per loop %timeit b[100,100] #1000000 loops, best of 3: 343 ns per loop %timeit b.item(100,100) #1000000 loops, best of 3: 297 ns per loop
I tried to use dis.dis
to see what was going on under the hood but got:
TypeError: don't know how to disassemble method-wrapper objects
Then I tried to look at the Numpy source code but couldn't figure out which file corresponded to array element access. I'm curious what accounts for the extra overhead, and more importantly how to figure this out for myself in the future. It seems like python can't be easily compiled to C code so that I can see the difference. But is there a way to see what byte code is generated for each line, to get a sense of the differences?
Because the Numpy array is densely packed in memory due to its homogeneous type, it also frees the memory faster. So overall a task executed in Numpy is around 5 to 100 times faster than the standard python list, which is a significant leap in terms of speed.
NumPy Arrays Are Faster Than Lists The array is randomly generated. As predicted, we can see that NumPy arrays are significantly faster than lists.
Bookmark this question. Show activity on this post. The np. apply_along_axis() function seems to be very slow (no output after 15 mins).
By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.
In summary: getting an item from a NumPy array requires new Python objects to be created, whereas this is not the case for lists. Also, indexing is more slightly more complicated for NumPy arrays than lists which may add some additional overhead.
To recap, the NumPy operations you have listed do the following:
b[100][100]
returns row 100 of b
as an array, and then gets the value at index 100 of this row, returning the value as an object (e.g. a np.int64
type).b[100,100]
returns the value at row 100 and column 100 directly (no intermediate array is returned first).b.item(100,100)
does the same as above b[100,100]
except that the value is converted to a native Python type and returned.Now of these operation, (1) is slowest because it requires two sequential NumPy indexing operations (I'll explain why this is slower than list indexing below). (2) is quickest because only a single indexing operation is performed. Operation (3) is possibly slower as it is a method call (these are generally slow in Python).
Why is list access still faster than b[100,100]
?
Python lists are arrays of pointers to objects in memory. For example, the list [1, 2, 3]
does not contain those integers directly, but rather pointers to the memory addresses were each integer object already exists. To get an item from the list, Python just returns a reference to the object.
NumPy arrays are not collections of objects. The array np.array([1, 2, 3])
is just a contiguous block of memory with bits set to represent those integer values. To get an integer from this array, a new Python object must be constructed in memory separate to the array. For instance, an object of np.int64
may be returned by the indexing operation: this object did not exist previously and had to be created.
Two other reasons why a[100][100]
(getting from the list) is quicker than b[100,100]
(getting from the array) are that:
The bytecode opcode BINARY_SUBSCR
is executed when indexing both lists and arrays, but it is optimised for the case of Python lists.
The internal C function handling integer indexing for Python lists is very short and simple. On the other hand, NumPy indexing is much more complicated and a significant amount of code is executed to determine the type of indexing being used so that the correct value can be returned.
Below, the steps for accessing elements in a list and array with a[100][100]
and b[100,100]
are described in more detail.
The same four bytecode opcodes are triggered for both lists and arrays:
0 LOAD_NAME 0 (a) # the list or array 3 LOAD_CONST 0 (100) # index number (tuple for b[100,100]) 6 BINARY_SUBSCR # find correct "getitem" function 7 RETURN_VALUE # value returned from list or array
Note: if you start chain indexing for multi-dimensional lists, e.g. a[100][100][100]
, you start to repeat these bytecode instructions. This does not happen for NumPy arrays using the tuple indexing: b[100,100,100]
uses just the four instructions. This is why the gap in the timings begins to close as the number of dimensions increases.
The functions for accessing lists and arrays are different and the correct one needs to be found in each case. This task is handled by the BINARY_SUBSCR
opcode:
w = POP(); // our index v = TOP(); // our list or NumPy array if (PyList_CheckExact(v) && PyInt_CheckExact(w)) { // do we have a list and an int? /* INLINE: list[int] */ Py_ssize_t i = PyInt_AsSsize_t(w); if (i < 0) i += PyList_GET_SIZE(v); if (i >= 0 && i < PyList_GET_SIZE(v)) { x = PyList_GET_ITEM(v, i); // call "getitem" for lists Py_INCREF(x); } else goto slow_get; } else slow_get: x = PyObject_GetItem(v, w); // else, call another function // to work out what is needed Py_DECREF(v); Py_DECREF(w); SET_TOP(x); if (x != NULL) continue; break;
This code is optimised for Python lists. If the function sees a list, it will quickly call the function PyList_GET_ITEM
. This list can now be accessed at the required index (see next section below).
However, if it doesn't see a list (e.g. we have a NumPy array), it takes the "slow_get" path. This in turn calls another function PyObject_GetItem
to check which "getitem" function the object is mapped to:
PyObject_GetItem(PyObject *o, PyObject *key) { PyMappingMethods *m; if (o == NULL || key == NULL) return null_error(); m = o->ob_type->tp_as_mapping; if (m && m->mp_subscript) return m->mp_subscript(o, key); ...
In the case of NumPy arrays, the correct function is located in mp_subscript
in the PyMappingMethods
structure.
Notice the additional function calls before this correct "get" function can be called. These calls add to the overhead for b[100]
, although how much will depend on how Python/NumPy was compiled, the system architecture, and so on.
Above it was seen that the function PyList_GET_ITEM
is called. This is a short function that essentially looks like this*:
PyList_GetItem(PyObject *op, Py_ssize_t i) { if (!PyList_Check(op)) { // check if list PyErr_BadInternalCall(); return NULL; } if (i < 0 || i >= Py_SIZE(op)) { // check i is in range if (indexerr == NULL) { indexerr = PyUnicode_FromString( "list index out of range"); if (indexerr == NULL) return NULL; } PyErr_SetObject(PyExc_IndexError, indexerr); return NULL; } return ((PyListObject *)op) -> ob_item[i]; // return reference to object }
* PyList_GET_ITEM
is actually the macro form of this function which does the same thing, minus error checking.
This means that getting the item at index i
of a Python list is relatively simple. Internally, Python checks whether the type of the item being is a list, whether i
is in the correct range for the list, and then returns the reference to the object in the list.
In contrast, NumPy has to do much more work before the value at the requested index can be returned.
Arrays can be indexed in a variety of different ways and NumPy has to decide which index routine is needed. The various indexing routines are handled largely by code in mapping.c
.
Anything used to index NumPy arrays passes through the function prepare_index
which begins the parsing of the index and stores the information about broadcasting, number of dimensions, and so on. Here is the call signature for the function:
NPY_NO_EXPORT int prepare_index(PyArrayObject *self, PyObject *index, npy_index_info *indices, int *num, int *ndim, int *out_fancy_ndim, int allow_boolean) /* @param the array being indexed * @param the index object * @param index info struct being filled (size of NPY_MAXDIMS * 2 + 1) * @param number of indices found * @param dimension of the indexing result * @param dimension of the fancy/advanced indices part * @param whether to allow the boolean special case */
The function has to do a lot of checks. Even for a relatively simple index such as b[100,100]
, a lot of information has to be inferred so that NumPy can return a reference (view) to the correct value.
In conclusion, it takes longer for the "getitem" function for NumPy to be found and the functions handling the indexing of arrays are necessarily more complex than the single function for Python lists.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With