I have a following code in python that at least for me produces strange results:
import numpy as np
import timeit
a = np.random.rand(3,2)
print timeit.timeit('a[2,1] + 1', 'from __main__ import a', number=1000000)
print timeit.timeit('a.item((2,1)) + 1', 'from __main__ import a', number=1000000)
This gives the result:
0.533630132675
0.103801012039
It seems ok if I only try to access numpy element but when increasing this element the timings get strange... Why is there such a difference in timings?
In this case, they don't return quite the same thing. a[2,1]
returns a numpy.float64
, while a.item((2,1))
returns a native python float.
numpy
scalars (float
, int
, etc)A numpy.float64
scalar isn't quite identical to a native python float
(they behave identically, however). Simple operations on a single element will be faster with a native python float, as there's less indirection. Have a look at the docstring for ndarray.item
for a bit more detail.
As an example of the difference in speed, consider the following:
In [1]: x = 1.2
In [2]: y = np.float64(1.2)
In [3]: %timeit x + 1
10000000 loops, best of 3: 58.9 ns per loop
In [4]: %timeit y + 1
1000000 loops, best of 3: 241 ns per loop
Initially, I incorrectly stated that a second factor was that a.item(...)
was slightly faster than a[...]
. That actually isn't true. The time it takes for a.item
to convert the numpy scalar into a native python scalar overwhelms the time it takes for the additional logic in a[...]
/a.__getitem__(...)
.
However, you should be careful about trying to generalize what happens with numpy scalars to how numpy arrays operate as a whole. If you're doing a lot of single-item indexing in numpy, it's generally an anti-pattern.
For example, compare:
In [5]: a = np.random.rand(1000)
In [6]: %timeit a + 1
100000 loops, best of 3: 2.32 us per loop
No matter what we do, we won't be able to match the speed (or much lower memory usage) of the vectorized version (a + 1
) above:
In [7]: %timeit [x + 1 for x in a]
1000 loops, best of 3: 257 us per loop
In [8]: %timeit [a.item(i) + 1 for i in range(len(a))]
1000 loops, best of 3: 208 us per loop
Some of this is because iterating through ndarray
s is slower than iterating through a list. For a completely fair comparison, let's convert everything over to a list of native python floats:
In [9]: b = a.tolist()
In [10]: type(b[0])
Out[10]: float
In [11]: %timeit [x + 1 for x in b]
10000 loops, best of 3: 69.4 us per loop
Clearly, using vectorized operations (the first case) is much faster when you're operating on larger arrays. It's also far more memory efficient, as list
s require storing pointers to each item, while ndarray
s are contiguous in memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With