Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NumPy: Alternative to `vectorize` that lets me access the array

Tags:

python

pypy

numpy

I have this code:

output_array = np.vectorize(f, otypes='d')(input_array)

And I'd like to replace it with this code, which is supposed to give the same output:

output_array = np.ndarray(input_array.shape, dtype='d')
for i, item in enumerate(input_array):
    output_array[i] = f(item)

The reason I want the second version is that I can then start iterating on output_array in a separate thread, while it's being calculated. (Yes, I know about the GIL, that part is taken care of.)

Unfortunately, the for loop is very slow, even when I'm not processing the data on separate thread. I benchmarked it on both CPython and PyPy3, which is my target platform. On CPython it's 3 times slower than vectorize, and on PyPy3 it's 67 times slower than vectorize!

That's despite the fact that the Numpy documentation says "The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop."

Any idea why my implementation is slow, and how to make a fast implementation that still allows me to use output_array before it's finished?

like image 738
Ram Rachum Avatar asked Jul 10 '20 06:07

Ram Rachum


People also ask

Are NumPy arrays vectorized?

The concept of vectorized operations on NumPy allows the use of more optimal and pre-compiled functions and mathematical operations on NumPy array objects and data sequences. The Output and Operations will speed up when compared to simple non-vectorized operations. Example 1: Using vectorized sum method on NumPy array.

Is NumPy vectorize faster than for loop?

Again, some have observed vectorize to be faster than normal for loops, but even the NumPy documentation states: “The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.”

Why is NumPy vectorize fast?

Numpy arrays tout a performance (speed) feature called vectorization. The generally held impression among the scientific computing community is that vectorization is fast because it replaces the loop (running each item one by one) with something else that runs the operation on several items in parallel.

What is the use of NP vectorize?

vectorize() function. The vectorize() function is used to generalize function class. Define a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns an single or tuple of numpy array as output.


Video Answer


1 Answers

Sebastian Berg gave me a solution. When iterating over items from the input array, use item.item() rather than just item. This turns the numpy.float64 objects to normal Python floats, making everything much faster and solving my particular problem :)

like image 97
Ram Rachum Avatar answered Oct 25 '22 18:10

Ram Rachum