Using numpy ufuncs vs built-in operators

Q: What are ufuncs in NumPy?

ufuncs stands for "Universal Functions" and they are NumPy functions that operates on the ndarray object. Why use ufuncs? ufuncs are used to implement vectorization in NumPy which is way faster than iterating over elements. They also provide broadcasting and additional methods like reduce, accumulate etc. that are very helpful for computation.

Q: What is the use of NumPy functions?

Numpy provides various universal functions that cover a wide variety of operations. These functions include standard trigonometric functions, functions for arithmetic operations, handling complex numbers, statistical functions, etc. These functions operates on ndarray (N-dimensional array) i.e Numpy’s array class.

Q: What is a U Func in Python?

At the core of every ufunc is a one-dimensional strided loop that implements the actual function for a specific type combination. When a ufunc is created, it is given a static list of inner loops and a corresponding list of type signatures over which the ufunc operates.

Q: How do you create a universal function in Python?

Python functions can also be created as a universal function using frompyfunc library function. Some ufuncs are called automatically when the corresponding arithmetic operator is used on arrays. For example when addition of two array is performed element-wise using ‘+’ operator then np.add () is called internally.

Tags:

python

numpy

I'm curious about the benefits and tradeoffs of using numpy ufuncs vs. the built-in operators vs. the 'function' versions of the built-in operators.

I'm curious about all ufuncs. Maybe there are times when some are more useful than others. However, I'll use < for my examples just for simplicity.

There are several ways to 'filter' a numpy array by a single number to get a boolean array. Each form gives the same results, but is there a preferred time/place to use one over the other? This example I'm comparing an array against a single number, so all 3 will work.

Consider all examples using the following array:

>>> x = numpy.arange(0, 10000)
>>> x
array([   0,    1,    2, ..., 9997, 9998, 9999])

'<' operator

>>> x < 5000
array([ True,  True,  True, ..., False, False, False], dtype=bool)
>>> %timeit x < 5000
100000 loops, best of 3: 15.3 us per loop

operator.lt

>>> import operator
>>> operator.lt(x, 5000)
array([ True,  True,  True, ..., False, False, False], dtype=bool)
>>> %timeit operator.lt(x, 5000)
100000 loops, best of 3: 15.3 us per loop

numpy.less

>>> numpy.less(x, 5000)
array([ True,  True,  True, ..., False, False, False], dtype=bool)
>>> %timeit numpy.less(x, 5000)
100000 loops, best of 3: 15 us per loop

Note that all of them achieve pretty much the equivalent performance and exactly the same results. I'm guessing that all of these calls actually end up in the same function anyway since < and operator.lt both map to __lt__ on a numpy array, which is probably implemented using numpy.less or the equivalent?

So, which is more 'idiomatic' and 'preferred'?

689

asked Mar 28 '13 18:03

durden2.0

1 Answers

Generally speaking, thinking of the "readability counts" mantra, the actual operator should always be your preferred choice. Using the operator versions has a place, when you can replace lambda a, b: a < b with the more compact operator.lt, but not much outside of that. And you really shouldn't be using explicit calls to the corresponding ufunc, unless you want to use the out parameter to store the calculated values directly in an existing array.

That said, if what you are worried is performance, you should do fair comparisons, because as you say, all your calls are eventually handled by numpy's less ufunc.

If your data is already in a numpy array, then you have already shown that they are all performing similarly, so go with the < operator for clarity.

What if your data is in a python object, say a list? Well, here are some timings for you to ponder:

In [13]: x = range(10**5)

In [19]: %timeit [j < 5000 for j in x]
100 loops, best of 3: 5.32 ms per loop

In [20]: %timeit np.less(x, 5000)
100 loops, best of 3: 11.3 ms per loop

In [21]: %timeit [operator.lt(j, 5000) for j in x]
100 loops, best of 3: 16.2 ms per loop

Not sure why operator.lt is so slow, but you clearly want to stay away from it. If you want to get a numpy array as output from a Python object input, then this will probably be the fastest:

In [22]: %timeit np.fromiter((j < 5000 for j in x), dtype=bool, count=10**5)
100 loops, best of 3: 7.91 ms per loop

Note that ufuncs operating on numpy arrays are much faster than any of the above:

In [24]: y = np.array(x)

In [25]: %timeit y < 5000
10000 loops, best of 3: 82.8 us per loop

192

answered Oct 07 '22 06:10

Jaime

Related questions
                            
                                Numpy: Array of `arange`s
                            
                                dynamically adding callable to class as instance "method"
                            
                                Numexpr: How to use "local_dict" and "global_dict"?
                            
                                Russian Peasant Multiplication Python 3.3
                            
                                Pandas using row labels in boolean indexing
                            
                                Reducing binary patterns in Python
                            
                                RabbitMQ python worker script using 100% CPU
                            
                                Saltstack grouping commands
                            
                                TypeError: decoding Unicode is not supported
                            
                                Oursql insallation failing wtih "cython not found"
                            
                                Python Azure blob storage upload file bigger then 64 MB
                            
                                Using pyephem to calculate when a satellite crosses a Longitude
                            
                                PyQt4 Local Directory view with option to select folders
                            
                                How to control a TPLINK router with a python script
                            
                                how to render django template from code instead of file on Google App Engine
                            
                                histogram with time bins from datetime vector
                            
                                Flask server sent events socket exception
                            
                                Python - Memoization and Collatz Sequence
                            
                                dnspython - get AAAA, A, NS and other records with one query
                            
                                Does a Python strip() on a split() string do anything?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With