Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I map python callable over numpy array in both elegant and efficient way?

The canonical approach (use of np.vectorize()) is not working in case of empty array - it ends with IndexError: index 0 is out of bounds for axis 0 with size 0:

>>> def f(x):
...     return x + 1
...
>>> F = np.vectorize(f)
>>> F(np.array([]))
[Traceback removed]
IndexError: index 0 is out of bounds for axis 0 with size 0

At the moment I use

>>> np.array([f(x) for x in X])

but I am looking for more elegant solution (and efficient). In Python 2 I may go with

>>> np.array(map(f, X))

but it fails in Python 3.

[EDIT]

The question has no answer in Efficient evaluation of a function at every cell of a NumPy array since:

  • vectorise fails,
  • the OP requested solution working in place: A(i, j) := f(A(i, j)).
like image 532
abukaj Avatar asked Mar 17 '17 17:03

abukaj


1 Answers

np.vectorize should work with an f that expects a scalar. But, it does have a problem with an empty input:

In [364]: def f(x):
     ...:     return 2*x
     ...: 
In [365]: fv = np.vectorize(f)
In [366]: fv(np.array([1,2,3]))
Out[366]: array([2, 4, 6])
In [367]: fv(np.array([]))
....
ValueError: cannot call `vectorize` on size 0 inputs unless `otypes` is set

My error is different than yours, but maybe that's a version issue. vectorize, without otypes makes a test calculation to determine the return type. If the input is empty, that gives a problem. The fix is to specify the otypes.

In [368]: fv1 = np.vectorize(f, otypes=[int])
In [369]: fv1(np.array([]))
Out[369]: array([], dtype=int32)

vectorize uses np.frompyfunc; the main difference is that frompyfunc returns an object array. Both have the big advantage that they take care of broadcasting. The input can be an array of any shape and dimension. Even better they work with multiple inputs, broadcasting them against each other. But vectorize warns that it is still iterative, so doesn't promise any speed improvements.

Read the comments of the accepted answer in the linked question.

But if the array (or list) is always 1d, you don't need that extra power. A plain Python iteration is as good as anything. It could be map (or list(map in py3), but I prefer the clarity of list comprehensions.

ret = [f(x) for x in X]  # works with list or array
np.array(ret)    # if you want an array 

One thing to watch out for is the type of the elements produced by various iteration methods

Iteration on an array produces array elements, or numpy dtypes:

In [387]: [type(x) for x in np.arange(3)]
Out[387]: [numpy.int32, numpy.int32, numpy.int32]

Iteration on a list returns the elements, what ever they are:

In [388]: [type(x) for x in [1,2,3]]
Out[388]: [int, int, int]

Iteration with frompyfunc produces 'scalars'

In [389]: ff=np.frompyfunc(type,1,1)
In [390]: ff(np.arange(3))
Out[390]: array([<class 'int'>, <class 'int'>, <class 'int'>], dtype=object)
In [391]: ff([1,2,3])
Out[391]: array([<class 'int'>, <class 'int'>, <class 'int'>], dtype=object)
In [393]: list(map(type,np.arange(3)))
Out[393]: [numpy.int32, numpy.int32, numpy.int32]

The part of vectorize that parses otypes is:

    if isinstance(otypes, str):
        for char in otypes:
            if char not in typecodes['All']:
                raise ValueError("Invalid otype specified: %s" % (char,))
    elif iterable(otypes):
        otypes = ''.join([_nx.dtype(x).char for x in otypes])
    elif otypes is not None:
        raise ValueError("Invalid otype specification")
    self.otypes = otypes

and

In [423]: np.typecodes['All']
Out[423]: '?bhilqpBHILQPefdgFDGSUVOMm'
like image 50
hpaulj Avatar answered Sep 20 '22 18:09

hpaulj