Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using numpy ufuncs vs built-in operators

Tags:

python

numpy

I'm curious about the benefits and tradeoffs of using numpy ufuncs vs. the built-in operators vs. the 'function' versions of the built-in operators.

I'm curious about all ufuncs. Maybe there are times when some are more useful than others. However, I'll use < for my examples just for simplicity.

There are several ways to 'filter' a numpy array by a single number to get a boolean array. Each form gives the same results, but is there a preferred time/place to use one over the other? This example I'm comparing an array against a single number, so all 3 will work.

Consider all examples using the following array:

>>> x = numpy.arange(0, 10000)
>>> x
array([   0,    1,    2, ..., 9997, 9998, 9999])

'<' operator

>>> x < 5000
array([ True,  True,  True, ..., False, False, False], dtype=bool)
>>> %timeit x < 5000
100000 loops, best of 3: 15.3 us per loop

operator.lt

>>> import operator
>>> operator.lt(x, 5000)
array([ True,  True,  True, ..., False, False, False], dtype=bool)
>>> %timeit operator.lt(x, 5000)
100000 loops, best of 3: 15.3 us per loop

numpy.less

>>> numpy.less(x, 5000)
array([ True,  True,  True, ..., False, False, False], dtype=bool)
>>> %timeit numpy.less(x, 5000)
100000 loops, best of 3: 15 us per loop

Note that all of them achieve pretty much the equivalent performance and exactly the same results. I'm guessing that all of these calls actually end up in the same function anyway since < and operator.lt both map to __lt__ on a numpy array, which is probably implemented using numpy.less or the equivalent?

So, which is more 'idiomatic' and 'preferred'?

like image 689
durden2.0 Avatar asked Mar 28 '13 18:03

durden2.0


People also ask

What are ufuncs in NumPy?

ufuncs stands for "Universal Functions" and they are NumPy functions that operates on the ndarray object. Why use ufuncs? ufuncs are used to implement vectorization in NumPy which is way faster than iterating over elements. They also provide broadcasting and additional methods like reduce, accumulate etc. that are very helpful for computation.

What is the use of NumPy functions?

Numpy provides various universal functions that cover a wide variety of operations. These functions include standard trigonometric functions, functions for arithmetic operations, handling complex numbers, statistical functions, etc. These functions operates on ndarray (N-dimensional array) i.e Numpy’s array class.

What is a U Func in Python?

At the core of every ufunc is a one-dimensional strided loop that implements the actual function for a specific type combination. When a ufunc is created, it is given a static list of inner loops and a corresponding list of type signatures over which the ufunc operates.

How do you create a universal function in Python?

Python functions can also be created as a universal function using frompyfunc library function. Some ufuncs are called automatically when the corresponding arithmetic operator is used on arrays. For example when addition of two array is performed element-wise using ‘+’ operator then np.add () is called internally.


1 Answers

Generally speaking, thinking of the "readability counts" mantra, the actual operator should always be your preferred choice. Using the operator versions has a place, when you can replace lambda a, b: a < b with the more compact operator.lt, but not much outside of that. And you really shouldn't be using explicit calls to the corresponding ufunc, unless you want to use the out parameter to store the calculated values directly in an existing array.

That said, if what you are worried is performance, you should do fair comparisons, because as you say, all your calls are eventually handled by numpy's less ufunc.

If your data is already in a numpy array, then you have already shown that they are all performing similarly, so go with the < operator for clarity.

What if your data is in a python object, say a list? Well, here are some timings for you to ponder:

In [13]: x = range(10**5)

In [19]: %timeit [j < 5000 for j in x]
100 loops, best of 3: 5.32 ms per loop

In [20]: %timeit np.less(x, 5000)
100 loops, best of 3: 11.3 ms per loop

In [21]: %timeit [operator.lt(j, 5000) for j in x]
100 loops, best of 3: 16.2 ms per loop

Not sure why operator.lt is so slow, but you clearly want to stay away from it. If you want to get a numpy array as output from a Python object input, then this will probably be the fastest:

In [22]: %timeit np.fromiter((j < 5000 for j in x), dtype=bool, count=10**5)
100 loops, best of 3: 7.91 ms per loop

Note that ufuncs operating on numpy arrays are much faster than any of the above:

In [24]: y = np.array(x)

In [25]: %timeit y < 5000
10000 loops, best of 3: 82.8 us per loop
like image 192
Jaime Avatar answered Oct 07 '22 06:10

Jaime