Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy item faster than operator[]

I have a following code in python that at least for me produces strange results:

import numpy as np
import timeit

a = np.random.rand(3,2)

print timeit.timeit('a[2,1] + 1', 'from __main__ import a', number=1000000)
print timeit.timeit('a.item((2,1)) + 1', 'from __main__ import a', number=1000000)

This gives the result:

0.533630132675
0.103801012039

It seems ok if I only try to access numpy element but when increasing this element the timings get strange... Why is there such a difference in timings?

like image 254
jerdna Avatar asked Sep 01 '15 14:09

jerdna


1 Answers

In this case, they don't return quite the same thing. a[2,1] returns a numpy.float64, while a.item((2,1)) returns a native python float.

Native vs numpy scalars (float, int, etc)

A numpy.float64 scalar isn't quite identical to a native python float (they behave identically, however). Simple operations on a single element will be faster with a native python float, as there's less indirection. Have a look at the docstring for ndarray.item for a bit more detail.

As an example of the difference in speed, consider the following:

In [1]: x = 1.2

In [2]: y = np.float64(1.2)

In [3]: %timeit x + 1
10000000 loops, best of 3: 58.9 ns per loop

In [4]: %timeit y + 1
1000000 loops, best of 3: 241 ns per loop

Initially, I incorrectly stated that a second factor was that a.item(...) was slightly faster than a[...]. That actually isn't true. The time it takes for a.item to convert the numpy scalar into a native python scalar overwhelms the time it takes for the additional logic in a[...]/a.__getitem__(...).


Don't generalize this result to more than one item

However, you should be careful about trying to generalize what happens with numpy scalars to how numpy arrays operate as a whole. If you're doing a lot of single-item indexing in numpy, it's generally an anti-pattern.

For example, compare:

In [5]: a = np.random.rand(1000)

In [6]: %timeit a + 1
100000 loops, best of 3: 2.32 us per loop

No matter what we do, we won't be able to match the speed (or much lower memory usage) of the vectorized version (a + 1) above:

In [7]: %timeit [x + 1 for x in a]
1000 loops, best of 3: 257 us per loop

In [8]: %timeit [a.item(i) + 1 for i in range(len(a))]
1000 loops, best of 3: 208 us per loop

Some of this is because iterating through ndarrays is slower than iterating through a list. For a completely fair comparison, let's convert everything over to a list of native python floats:

In [9]: b = a.tolist()

In [10]: type(b[0])
Out[10]: float

In [11]: %timeit [x + 1 for x in b]
10000 loops, best of 3: 69.4 us per loop

Clearly, using vectorized operations (the first case) is much faster when you're operating on larger arrays. It's also far more memory efficient, as lists require storing pointers to each item, while ndarrays are contiguous in memory.

like image 176
Joe Kington Avatar answered Oct 19 '22 22:10

Joe Kington