I have a following code in python that at least for me produces strange results:
import numpy as np
import timeit
a = np.random.rand(3,2)
print timeit.timeit('a[2,1] + 1', 'from __main__ import a', number=1000000)
print timeit.timeit('a.item((2,1)) + 1', 'from __main__ import a', number=1000000)
This gives the result:
0.533630132675
0.103801012039
It seems ok if I only try to access numpy element but when increasing this element the timings get strange... Why is there such a difference in timings?
In this case, they don't return quite the same thing. a[2,1] returns a numpy.float64, while a.item((2,1)) returns a native python float.
numpy scalars (float, int, etc)A numpy.float64 scalar isn't quite identical to a native python float (they behave identically, however). Simple operations on a single element will be faster with a native python float, as there's less indirection. Have a look at the docstring for ndarray.item for a bit more detail.
As an example of the difference in speed, consider the following:
In [1]: x = 1.2
In [2]: y = np.float64(1.2)
In [3]: %timeit x + 1
10000000 loops, best of 3: 58.9 ns per loop
In [4]: %timeit y + 1
1000000 loops, best of 3: 241 ns per loop
Initially, I incorrectly stated that a second factor was that a.item(...) was slightly faster than a[...]. That actually isn't true. The time it takes for a.item to convert the numpy scalar into a native python scalar overwhelms the time it takes for the additional logic in a[...]/a.__getitem__(...).
However, you should be careful about trying to generalize what happens with numpy scalars to how numpy arrays operate as a whole. If you're doing a lot of single-item indexing in numpy, it's generally an anti-pattern.
For example, compare:
In [5]: a = np.random.rand(1000)
In [6]: %timeit a + 1
100000 loops, best of 3: 2.32 us per loop
No matter what we do, we won't be able to match the speed (or much lower memory usage) of the vectorized version (a + 1) above:
In [7]: %timeit [x + 1 for x in a]
1000 loops, best of 3: 257 us per loop
In [8]: %timeit [a.item(i) + 1 for i in range(len(a))]
1000 loops, best of 3: 208 us per loop
Some of this is because iterating through ndarrays is slower than iterating through a list. For a completely fair comparison, let's convert everything over to a list of native python floats:
In [9]: b = a.tolist()
In [10]: type(b[0])
Out[10]: float
In [11]: %timeit [x + 1 for x in b]
10000 loops, best of 3: 69.4 us per loop
Clearly, using vectorized operations (the first case) is much faster when you're operating on larger arrays. It's also far more memory efficient, as lists require storing pointers to each item, while ndarrays are contiguous in memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With