Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is numpy vectorized function apparently called an extra time?

I have a numpy object array containing several lists of index numbers:

>>> idxLsts = np.array([[1], [0, 2]], dtype=object)

I define a vectorized function to append a value to each list:

>>> idx = 99  
>>> f = np.vectorize(lambda idxLst: idxLst.append(idx))

I invoke the function. I don't care about the return value, just the side effect.

>>> f(idxLsts)  
array([None, None], dtype=object)

The index 99 was added twice to the first list. Why? I'm stumped.

>>> idxLsts
array([[1, 99, 99], [0, 2, 99]], dtype=object)

With other values of idxLsts, it doesn't happen:

>>> idxLsts = np.array([[1, 2], [0, 2, 4]], dtype=object)
>>> f(idxLsts)
array([None, None], dtype=object)
>>> idxLsts
array([[1, 2, 99], [0, 2, 4, 99]], dtype=object)

My suspicion is it's related to the documentation which says: "Define a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns a numpy array as output. The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy."

like image 590
RVS Avatar asked Oct 26 '12 23:10

RVS


People also ask

Why are NumPy vectorized operations faster?

Numpy arrays tout a performance (speed) feature called vectorization. The generally held impression among the scientific computing community is that vectorization is fast because it replaces the loop (running each item one by one) with something else that runs the operation on several items in parallel.

What is a vectorized function NumPy?

Define a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns a single numpy array or a tuple of numpy arrays. The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy.

Is NumPy vectorize faster than for loop?

Vectorized implementations (numpy) are much faster and more efficient as compared to for-loops. To really see HOW large the difference is, let's try some simple operations used in most machine learnign algorithms (especially deep learning).

Why is NumPy so fast?

NumPy is fast because it can do all its calculations without calling back into Python. Since this function involves looping in Python, we lose all the performance benefits of using NumPy. For a 10,000,000-entry NumPy array, this functions takes 2.5 seconds to run on my computer.


1 Answers

From the vectorize docstring:

The data type of the output of `vectorized` is determined by calling
the function with the first element of the input.  This can be avoided
by specifying the `otypes` argument.

And from the code:

        theout = self.thefunc(*newargs)

This is an extra call to thefunc, used to determine the output type. This is why the first element is getting two 99s appended.

This behavior happens in your second case as well:

import numpy as np
idxLsts = np.array([[1, 2], [0,2,4]], dtype = object)
idx = 99
f = np.vectorize(lambda x: x.append(idx))
f(idxLsts)
print(idxLsts)

yields

[[1, 2, 99, 99] [0, 2, 4, 99]]

You could use np.frompyfunc instead of np.vectorize:

import numpy as np
idxLsts = np.array([[1, 2], [0,2,4]], dtype = object)
idx = 99
f = np.frompyfunc(lambda x: x.append(idx), 1, 1)
f(idxLsts)
print(idxLsts)

yields

[[1, 2, 99] [0, 2, 4, 99]]
like image 132
unutbu Avatar answered Nov 14 '22 22:11

unutbu