Optimizing access on numpy arrays for numba

Question

I recently stumbled upon numba and thought about replacing some homemade C extensions with more elegant autojitted python code. Unfortunately I wasn't happy, when I tried a first, quick benchmark. It seems like numba is not doing much better than ordinary python here, though I would have expected nearly C-like performance:

from numba import jit, autojit, uint, double
import numpy as np
import imp
import logging
logging.getLogger('numba.codegen.debug').setLevel(logging.INFO)

def sum_accum(accmap, a):
    res = np.zeros(np.max(accmap) + 1, dtype=a.dtype)
    for i in xrange(len(accmap)):
        res[accmap[i]] += a[i]
    return res

autonumba_sum_accum = autojit(sum_accum)
numba_sum_accum = jit(double[:](int_[:], double[:]), 
                      locals=dict(i=uint))(sum_accum)

accmap = np.repeat(np.arange(1000), 2)
np.random.shuffle(accmap)
accmap = np.repeat(accmap, 10)
a = np.random.randn(accmap.size)

ref = sum_accum(accmap, a)
assert np.all(ref == numba_sum_accum(accmap, a))
assert np.all(ref == autonumba_sum_accum(accmap, a))

%timeit sum_accum(accmap, a)
%timeit autonumba_sum_accum(accmap, a)
%timeit numba_sum_accum(accmap, a)

accumarray = imp.load_source('accumarray', '/path/to/accumarray.py')
assert np.all(ref == accumarray.accum(accmap, a))

%timeit accumarray.accum(accmap, a)

This gives on my machine:

10 loops, best of 3: 52 ms per loop
10 loops, best of 3: 42.2 ms per loop
10 loops, best of 3: 43.5 ms per loop
1000 loops, best of 3: 321 us per loop

I'm running the latest numba version from pypi, 0.11.0. Any suggestions, how to fix the code, so it runs reasonably fast with numba?

Michael · Accepted Answer

I figured out myself. numba wasn't able to determine the type of the result of np.max(accmap), even if the type of accmap was set to int. This somehow slowed down everything, but the fix is easy:

@autojit(locals=dict(reslen=uint))
def sum_accum(accmap, a):
    reslen = np.max(accmap) + 1
    res = np.zeros(reslen, dtype=a.dtype)
    for i in range(len(accmap)):
        res[accmap[i]] += a[i]
    return res

The result is quite impressive, about 2/3 of the C version:

10000 loops, best of 3: 192 us per loop

Update 2022: The work on this issue led to the python package numpy_groupies, which is available here:

https://github.com/ml31415/numpy-groupies

M4rtini · Answer

@autojit
def numbaMax(arr):
    MAX = arr[0]
    for i in arr:
        if i > MAX:
            MAX = i
    return MAX

@autojit
def autonumba_sum_accum2(accmap, a):
    res = np.zeros(numbaMax(accmap) + 1)
    for i in xrange(len(accmap)):
        res[accmap[i]] += a[i]
    return res

10 loops, best of 3: 26.5 ms per loop <- original
100 loops, best of 3: 15.1 ms per loop <- with numba but the slow numpy max
10000 loops, best of 3: 47.9 µs per loop <- with numbamax

Optimizing access on numpy arrays for numba

Tags:

python

jit

numpy

numba

accumarray

Michael

2 Answers

Michael

M4rtini

Recent Activity

Donate For Us

Optimizing access on numpy arrays for numba

Tags:

python

jit

numpy

numba

accumarray

Michael

2 Answers

Michael

M4rtini

Related questions

Recent Activity

Donate For Us