Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I vectorize a function with Numba, when the function takes arrays as arguments?

I'd like to use Numba to vectorize a function that will evaluate each row of a matrix. This would essentially apply a Numpy ufunc to the matrix, as opposed to looping over the rows. According to the docs:

You might ask yourself, “why would I go through this instead of compiling a simple iteration loop using the @jit decorator?”. The answer is that NumPy ufuncs automatically get other features such as reduction, accumulation or broadcasting.

With that in mind, I can't get even a toy example to work. The following simple example tries to calculate the sum of elements in each row.

import numba, numpy as np

# Define the row-wise function to be vectorized:
@numba.guvectorize(["void(float64[:],float64)"],"(n)->()")
def f(a,b):
    b = a.sum() 

# Apply the function to an array with five rows:
a = np.arange(10).reshape(5,2)
b = f(a)   

I used the @guvectorize decorator, since I'd like the decorated function to take the argument a as each row of the matrix, which is an array; @vectorize takes only scalar inputs. I also wrote the signature to take an array argument and modify a scalar output. As per the docs, the decorated function does not use a return statement.

The result should be b = [1,5,9,13,17], but instead I got b=[0.,1.,2.,3.,4.]. Clearly, I'm missing something. I'd appreciate some direction, keeping in mind that the sum is just an example.

like image 832
Aboottogo Avatar asked Jul 16 '18 20:07

Aboottogo


People also ask

What is vectorize in Numba?

Numba's vectorize allows Python functions taking scalar input arguments to be used as NumPy ufuncs. Creating a traditional NumPy ufunc is not the most straightforward process and involves writing some C code. Numba makes this easy.

What is array vectorization?

"Vectorization" (simplified) is the process of rewriting a loop so that instead of processing a single element of an array N times, it processes (say) 4 elements of the array simultaneously N/4 times.

What is a vectorized operation like operations on NumPy arrays )?

The concept of vectorized operations on NumPy allows the use of more optimal and pre-compiled functions and mathematical operations on NumPy array objects and data sequences. The Output and Operations will speed up when compared to simple non-vectorized operations.

What does vectorize function do?

The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy. The data type of the output of vectorized is determined by calling the function with the first element of the input.


2 Answers

b = a.sum() can't ever modify the original value of b in python syntax.

numba gets around this by requiring every param to a gufunc be an array - scalars are just length 1, that you can then assign into. So you need both params as arrays, and the assignment must use []

@numba.guvectorize(["void(float64[:],float64[:])"],"(n)->()")
def f(a,b):
    b[:] = a.sum()
    # or b[0] = a.sum()

f(a)
Out[246]: array([ 1.,  5.,  9., 13., 17.])
like image 126
chrisb Avatar answered Oct 24 '22 04:10

chrisb


@chrisb has a great answer above. This answer should add a bit of clarification for those newer to vectorization.

In terms of vectorization (in numpy and numba), you pass vectors of inputs.

For example:

import numpy as np

a=[1,2]
b=[3,4]

@np.vectorize
def f(x_1,x_2):
    return x_1+x_2

print(f(a,b))
#-> [4,6]

In numba, you would traditionally need to pass in input types to the vectorize decorator. In more recent versions of numba, you do not need to specify vector input types if you pass in numpy arrays as inputs to a generically vectorized function.

For example:

import numpy as np
import numba as nb

a=np.array([1,2])
b=np.array([3,4])

# Note a generic vectorize decorator with input types not specified
@nb.vectorize
def f(x_1,x_2):
    return x_1+x_2

print(f(a,b))
#-> [4,6]

So far, variables are simple single objects that get passed into the function from the input arrays. This makes it possible for numba to convert the python code to simple ufuncs that can operate on the numpy arrays.

In your example of summing up a vector, you would need to pass data as a single vector of vectors. To do this you need to create ufuncs that operate on vectors themselves. This requires a bit more work and specification for how you want the arbitrary outputs to be created Enter the guvectorize function (docs here and here).

Since you are providing a vector of vectors. Your outer vector is approached similar to how you use vectorize above. Now you need to specify what each inner vector looks like for your input values.

EG adding an arbitrary vector of integers. (This will not work for a few reasons explained below)

@nb.guvectorize([(nb.int64[:])])
def f(x):
    return x.sum()

Now you will also need to add an extra input to your function and decorator. This allows you to specify an arbitrary type to store the output of your function. Instead of returning output, you will now update this input variable. Think of this final variable as a custom variable numba uses to generate an arbitrary output vector when creating the ufunc for numpy evaluation.

This input also needs to be specified in the decorator and your function should look something like this:

@nb.guvectorize([(nb.int64[:],nb.int64[:])])
def f(x, out):
    out[:]=x.sum()

Finally you need to specify input and output formats in the decorator. These are given as matrix shapes in the order of input vectors and uses an arrow to indicate the output vector shape (which is actually your final input). In this case you are taking a vector of size n and outputing the results as a value and not a vector. Your format should be (n)->().

As a more complex example, assuming you have two input vectors for matrix multiplication of size (m,n) and (n,o) and you wanted your output vector to be of size (m,o) your decorator format would look like (m,n),(n,o)->(m,o).

A complete function for the current problem would look something like:

@nb.guvectorize([(nb.int64[:],nb.int64[:])], '(n)->()')
def f(x, out):
    out[:]=x.sum()

Your end code should look something like:

import numpy as np
import numba as nb

a=np.arange(10).reshape(5,2)
# Equivalent to
# a=np.array([
#   [0,1],
#   [2,3],
#   [4,5],
#   [6,7],
#   [8,9]
# ])

@nb.guvectorize([(nb.int64[:],nb.int64[:])], '(n)->()')
def f(x, out):
    out[:]=x.sum()

print(f(a))
#-> [ 1  5  9 13 17]
like image 22
conmak Avatar answered Oct 24 '22 05:10

conmak