NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.
Two reasons, why the List might be slightly slower: To look up a element in the list, a method of List is called, which does the look up in the underlying array. So you need an additional method call there. On the other hand the compiler might recognize this and optimize the "unnecessary" call away.
NumPy random for generating an array of random numbers ndarray of 1000 random numbers. The reason why NumPy is fast when used right is that its arrays are extremely efficient. They are like C arrays instead of Python lists.
Creating a tuple is faster than creating a list. Creating a list is slower because two memory blocks need to be accessed. An element in a tuple cannot be removed or replaced.
The storage is "unboxed", but every time you access an element Python has to "box" it (embed it in a regular Python object) in order to do anything with it. For example, your sum(A)
iterates over the array, and boxes each integer, one at a time, in a regular Python int
object. That costs time. In your sum(L)
, all the boxing was done at the time the list was created.
So, in the end, an array is generally slower, but requires substantially less memory.
Here's the relevant code from a recent version of Python 3, but the same basic ideas apply to all CPython implementations since Python was first released.
Here's the code to access a list item:
PyObject *
PyList_GetItem(PyObject *op, Py_ssize_t i)
{
/* error checking omitted */
return ((PyListObject *)op) -> ob_item[i];
}
There's very little to it: somelist[i]
just returns the i
'th object in the list (and all Python objects in CPython are pointers to a struct whose initial segment conforms to the layout of a struct PyObject
).
And here's the __getitem__
implementation for an array
with type code l
:
static PyObject *
l_getitem(arrayobject *ap, Py_ssize_t i)
{
return PyLong_FromLong(((long *)ap->ob_item)[i]);
}
The raw memory is treated as a vector of platform-native C
long
integers; the i
'th C long
is read up; and then PyLong_FromLong()
is called to wrap ("box") the native C long
in a Python long
object (which, in Python 3, which eliminates Python 2's distinction between int
and long
, is actually shown as type int
).
This boxing has to allocate new memory for a Python int
object, and spray the native C long
's bits into it. In the context of the original example, this object's lifetime is very brief (just long enough for sum()
to add the contents into a running total), and then more time is required to deallocate the new int
object.
This is where the speed difference comes from, always has come from, and always will come from in the CPython implementation.
To add to Tim Peters' excellent answer, arrays implement the buffer protocol, while lists do not. This means that, if you are writing a C extension (or the moral equivalent, such as writing a Cython module), then you can access and work with the elements of an array much faster than anything Python can do. This will give you considerable speed improvements, possibly well over an order of magnitude. However, it has a number of downsides:
Going straight to C extensions may be using a sledgehammer to swat a fly, depending on your use case. You should first investigate NumPy and see if it is powerful enough to do whatever math you're trying to do. It will also be much faster than native Python, if used correctly.
Tim Peters answered why this is slow, but let's see how to improve it.
Sticking to your example of sum(range(...))
(factor 10 smaller than your example to fit into memory here):
import numpy
import array
L = list(range(10**7))
A = array.array('l', L)
N = numpy.array(L)
%timeit sum(L)
10 loops, best of 3: 101 ms per loop
%timeit sum(A)
1 loop, best of 3: 237 ms per loop
%timeit sum(N)
1 loop, best of 3: 743 ms per loop
This way also numpy needs to box/unbox, which has additional overhead. To make it fast one has to stay within the numpy c code:
%timeit N.sum()
100 loops, best of 3: 6.27 ms per loop
So from the list solution to the numpy version this is a factor 16 in runtime.
Let's also check how long creating those data structures takes
%timeit list(range(10**7))
1 loop, best of 3: 283 ms per loop
%timeit array.array('l', range(10**7))
1 loop, best of 3: 884 ms per loop
%timeit numpy.array(range(10**7))
1 loop, best of 3: 1.49 s per loop
%timeit numpy.arange(10**7)
10 loops, best of 3: 21.7 ms per loop
Clear winner: Numpy
Also note that creating the data structure takes about as much time as summing, if not more. Allocating memory is slow.
Memory usage of those:
sys.getsizeof(L)
90000112
sys.getsizeof(A)
81940352
sys.getsizeof(N)
80000096
So these take 8 bytes per number with varying overhead. For the range we use 32bit ints are sufficient, so we can safe some memory.
N=numpy.arange(10**7, dtype=numpy.int32)
sys.getsizeof(N)
40000096
%timeit N.sum()
100 loops, best of 3: 8.35 ms per loop
But it turns out that adding 64bit ints is faster than 32bit ints on my machine, so this is only worth it if you are limited by memory/bandwidth.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With