I want to recreate a numpy.sum
-like function. I don't plan to recreate numpy.sum
but a similar function with the same principle: Iterate over the items and do something with each of them and then return a result.
How can I make one numba function that understands the "reduce along axis x
" behaviour of the numpy-functions.
Let's say the basic function looks like this (taken from the numba source code):
def numba_sum(arr):
s = 0.
for val in np.nditer(arr):
s += val.item()
return s
This works great if I numba.jit
that, but then it doesn't support any axis
-argument.
numba.vectorize
isn't much better, it offers .reduce(axis=x)
but only if the function is binary (accepts two arguments) , which the one above is not, and even then it supports only one scalar axis.
numba.guvectorize
can help, but I need to explicitly define when creating the function along which axis (if any) I want to reduce the function.
In short how can I make the function numba_sum
work like numpy.sum
, that is, it should support:
axis=None
, axis=x
(x
integer) and axis=(x,y)
in nopython=True
mode?
nansum. Return the sum of array elements over a given axis treating Not a Numbers (NaNs) as zero.
sum with the axis parameter, the function will sum the values along a particular axis. In particular, when we use np. sum with axis = 0 , the function will sum over the 0th axis (the rows). It's basically summing up the values row-wise, and producing a new array (with lower dimensions).
What do you get if you apply NumPy sum () to a list that contains only Boolean values? sum receives an array of booleans as its argument, it'll sum each element (count True as 1 and False as 0) and return the outcome.
sum() in Python. numpy. sum(arr, axis, dtype, out) : This function returns the sum of array elements over the specified axis.
I know the following is not the most concise or efficient implementation, but it is a working one! The general idea is that you need one loop to handle the output and other loop to aggregate the values over the indices the correspond to the output.
I have tried to minimise the use of various numpy operations inside numba keeping it closer to something implemented in C. (for the sake of clarity, np.ndindex was kept.) Some setup needs to occur before the function is called, and we just do this in plain python.
This implementation is by no means optimal or efficient, but it gets the job done and may help with inspiration for a more efficient version.
import numpy as np
from numpy.core.numeric import normalize_axis_index
import numba
@numba.njit()
def _numba_sum_into(arr, out, mask_out, mask_agg, shape_out, shape_agg):
# get the multipliers
# these times the position gives the index in the flattened array
# this uses the same logic as np.ravel_multi_index
multipliers = np.ones(arr.ndim, dtype='int32')
for i in range(arr.ndim - 2, -1, -1):
multipliers[i] = arr.shape[i + 1] * multipliers[i + 1]
# multiplier components
multipliers_agg = multipliers[mask_agg]
multipliers_out = multipliers[mask_out]
# flattened array
a = arr.flatten()
# loop over the kept values
for pos_out in np.ndindex(shape_out):
total = 0.
# this uses the same logic as np.ravel_multi_index
i = 0
for p, m in zip(pos_out, multipliers_out):
i += p * m
# loop over the aggregate values
for pos_agg in np.ndindex(shape_agg):
# this uses the same logic as np.ravel_multi_index
j = 0
for p, m in zip(pos_agg, multipliers_agg):
j += p * m
# update the total
total += a[i + j]
# save the total
out[pos_out] = total
# done!
return out
@lru_cache()
def _normalize_axis(axis, ndim: int) -> tuple:
if axis is None:
axis = np.arange(ndim)
axis = np.core.numeric.normalize_axis_tuple(axis, ndim, allow_duplicate=False)
return axis
@lru_cache()
def _get_out_and_agg_parts(norm_axis, ndim: int, shape: tuple):
axis = np.array(norm_axis)
# get used dims
mask_agg = np.zeros(ndim, dtype='bool')
mask_agg[axis] = True
mask_out = ~mask_agg
# make immutable
mask_agg.flags['WRITEABLE'] = False
mask_out.flags['WRITEABLE'] = False
# get the shape
shape = np.array(shape)
shape_agg = tuple(shape[mask_agg].tolist())
shape_out = tuple(shape[mask_out].tolist())
# done
return mask_out, mask_agg, shape_out, shape_agg
def numba_sum(arr, axis=None):
axis = _normalize_axis(axis, arr.ndim)
# get the various shapes
mask_out, mask_agg, shape_out, shape_agg = _get_out_and_agg_parts(axis, arr.ndim, arr.shape)
# make the output array
out = np.zeros(shape_out, dtype=arr.dtype)
# write into the array
_numba_sum_into(arr, out, mask_out, mask_agg, shape_out, shape_agg)
# done!
return out
if __name__ == '__main__':
arr = np.random.random([2, 3, 4])
print(numba_sum(arr, axis=None))
print(numba_sum(arr, axis=(0, 1, 2)))
print(numba_sum(arr, axis=(0, 2)))
print(numba_sum(arr, axis=(0, -1)))
print(numba_sum(arr, axis=0))
print(numba_sum(arr, axis=-1))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With