I've found that len(arr)
is almost twice as fast as arr.shape[0]
and am wondering why.
I am using Python 3.5.2, Numpy 1.14.2, IPython 6.3.1
The below code demonstrates this:
arr = np.random.randint(1, 11, size=(3, 4, 5))
%timeit len(arr)
# 62.6 ns ± 0.239 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit arr.shape[0]
# 102 ns ± 0.163 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
I've also done some more tests for comparison:
class Foo():
def __init__(self):
self.shape = (3, 4, 5)
foo = Foo()
%timeit arr.shape
# 75.6 ns ± 0.107 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit foo.shape
# 61.2 ns ± 0.281 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit foo.shape[0]
# 78.6 ns ± 1.03 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
So I have two questions:
1) Why does len(arr)
works faster than arr.shape[0]
? (I would have thought len
would be slower because of the function call)
2) Why does foo.shape[0]
work faster than arr.shape[0]
? (In other words, what overhead does do numpy arrays incur in this case?)
NumPy is fast because it can do all its calculations without calling back into Python. Since this function involves looping in Python, we lose all the performance benefits of using NumPy. For a 10,000,000-entry NumPy array, this functions takes 2.5 seconds to run on my computer.
The following code multiplies each element of an array with a corresponding element in another array. Finally, we sum up all the individual products. Once again, the NumPy version was about 100 times faster than iterating over a list.
You can use the len() method for NumPy arrays, but NumPy also has the built-in typecode . size that you can use to calculate length. Both outputs return 8, the number of elements in the array.
NumPy uses much less memory to store data The NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.
The numpy array data structure is implemented in C. The dimensions of the array are stored in a C structure. They are not stored in a Python tuple. So each time you read the shape
attribute, a new Python tuple of new Python integer objects is created. When you use arr.shape[0]
, that tuple is then indexed to pull out the first element, which adds a little more overhead. len(arr)
only has to create a Python integer.
You can easily verify that arr.shape
creates a new tuple each time it is read:
In [126]: arr = np.random.randint(1, 11, size=(3, 4, 5))
In [127]: s1 = arr.shape
In [128]: id(s1)
Out[128]: 4916019848
In [129]: s2 = arr.shape
In [130]: id(s2)
Out[130]: 4909905024
s1
and s2
have different id
s; they are different tuple objects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With