How do I vectorize a function with Numba, when the function takes arrays as arguments?

Tags:

I'd like to use Numba to vectorize a function that will evaluate each row of a matrix. This would essentially apply a Numpy ufunc to the matrix, as opposed to looping over the rows. According to the docs:

You might ask yourself, “why would I go through this instead of compiling a simple iteration loop using the @jit decorator?”. The answer is that NumPy ufuncs automatically get other features such as reduction, accumulation or broadcasting.

With that in mind, I can't get even a toy example to work. The following simple example tries to calculate the sum of elements in each row.

import numba, numpy as np

# Define the row-wise function to be vectorized:
@numba.guvectorize(["void(float64[:],float64)"],"(n)->()")
def f(a,b):
    b = a.sum() 

# Apply the function to an array with five rows:
a = np.arange(10).reshape(5,2)
b = f(a)

I used the @guvectorize decorator, since I'd like the decorated function to take the argument a as each row of the matrix, which is an array; @vectorize takes only scalar inputs. I also wrote the signature to take an array argument and modify a scalar output. As per the docs, the decorated function does not use a return statement.

The result should be b = [1,5,9,13,17], but instead I got b=[0.,1.,2.,3.,4.]. Clearly, I'm missing something. I'd appreciate some direction, keeping in mind that the sum is just an example.

832

asked Jul 16 '18 20:07

Aboottogo

2 Answers

b = a.sum() can't ever modify the original value of b in python syntax.

numba gets around this by requiring every param to a gufunc be an array - scalars are just length 1, that you can then assign into. So you need both params as arrays, and the assignment must use []

@numba.guvectorize(["void(float64[:],float64[:])"],"(n)->()")
def f(a,b):
    b[:] = a.sum()
    # or b[0] = a.sum()

f(a)
Out[246]: array([ 1.,  5.,  9., 13., 17.])

126

answered Oct 24 '22 04:10

chrisb

@chrisb has a great answer above. This answer should add a bit of clarification for those newer to vectorization.

In terms of vectorization (in numpy and numba), you pass vectors of inputs.

For example:

import numpy as np

a=[1,2]
b=[3,4]

@np.vectorize
def f(x_1,x_2):
    return x_1+x_2

print(f(a,b))
#-> [4,6]

In numba, you would traditionally need to pass in input types to the vectorize decorator. In more recent versions of numba, you do not need to specify vector input types if you pass in numpy arrays as inputs to a generically vectorized function.

For example:

import numpy as np
import numba as nb

a=np.array([1,2])
b=np.array([3,4])

# Note a generic vectorize decorator with input types not specified
@nb.vectorize
def f(x_1,x_2):
    return x_1+x_2

print(f(a,b))
#-> [4,6]

So far, variables are simple single objects that get passed into the function from the input arrays. This makes it possible for numba to convert the python code to simple ufuncs that can operate on the numpy arrays.

In your example of summing up a vector, you would need to pass data as a single vector of vectors. To do this you need to create ufuncs that operate on vectors themselves. This requires a bit more work and specification for how you want the arbitrary outputs to be created Enter the guvectorize function (docs here and here).

Since you are providing a vector of vectors. Your outer vector is approached similar to how you use vectorize above. Now you need to specify what each inner vector looks like for your input values.

EG adding an arbitrary vector of integers. (This will not work for a few reasons explained below)

@nb.guvectorize([(nb.int64[:])])
def f(x):
    return x.sum()

Now you will also need to add an extra input to your function and decorator. This allows you to specify an arbitrary type to store the output of your function. Instead of returning output, you will now update this input variable. Think of this final variable as a custom variable numba uses to generate an arbitrary output vector when creating the ufunc for numpy evaluation.

This input also needs to be specified in the decorator and your function should look something like this:

@nb.guvectorize([(nb.int64[:],nb.int64[:])])
def f(x, out):
    out[:]=x.sum()

Finally you need to specify input and output formats in the decorator. These are given as matrix shapes in the order of input vectors and uses an arrow to indicate the output vector shape (which is actually your final input). In this case you are taking a vector of size n and outputing the results as a value and not a vector. Your format should be (n)->().

As a more complex example, assuming you have two input vectors for matrix multiplication of size (m,n) and (n,o) and you wanted your output vector to be of size (m,o) your decorator format would look like (m,n),(n,o)->(m,o).

A complete function for the current problem would look something like:

@nb.guvectorize([(nb.int64[:],nb.int64[:])], '(n)->()')
def f(x, out):
    out[:]=x.sum()

Your end code should look something like:

import numpy as np
import numba as nb

a=np.arange(10).reshape(5,2)
# Equivalent to
# a=np.array([
#   [0,1],
#   [2,3],
#   [4,5],
#   [6,7],
#   [8,9]
# ])

@nb.guvectorize([(nb.int64[:],nb.int64[:])], '(n)->()')
def f(x, out):
    out[:]=x.sum()

print(f(a))
#-> [ 1  5  9 13 17]

answered Oct 24 '22 05:10

conmak

Related questions
                            
                                How to pass global debug flag variable throughout my code; should I use argparse?
                            
                                worker_machine_type tag not working in Google Cloud Dataflow with python
                            
                                LSTM preprocessing: Build 3d arrays from pandas data frame based on ID
                            
                                How to update pip version installed by pyenv
                            
                                Upgrading SQLite3 version used in python3 on linux?
                            
                                Regarding GIL in python
                            
                                Python for .NET: How to explicitly create instances of C# classes using different versions of the same DLL?
                            
                                Containers communication with python requests
                            
                                Array comparison not matching elementwise comparison in numpy
                            
                                Convert Pandas DataFrame to & from In-Memory Feather
                            
                                How to get the length of an jagged array in python
                            
                                Cannot import grafana dashboard via Grafana API
                            
                                ModuleNotFoundError in PySpark Worker on rdd.collect()
                            
                                What is the purpose of mounting a Session object?
                            
                                LightGBM: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
                            
                                Pandas read_csv add header names in case of changing number of columns
                            
                                Difference in Running Time on Leet Code
                            
                                Reading a csv file with a list of elements into pandas dataframe
                            
                                Using lambda and defaultdict
                            
                                Complexity of the internal hash() function in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I vectorize a function with Numba, when the function takes arrays as arguments?

Tags:

python

numpy

scipy

numba

Aboottogo

People also ask

2 Answers

chrisb

conmak

Recent Activity

Donate For Us