Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numba-safe version of itertools.combinations?

I have some code which loops through a large set of itertools.combinations, which is now a performance bottleneck. I'm trying to turn to numba's @jit(nopython=True) to speed it up, but I'm running into some issues.

First, it seems numba can't handle itertools.combinations itself, per this small example:

import itertools
import numpy as np
from numba import jit

arr = [1, 2, 3]
c = 2

@jit(nopython=True)
def using_it(arr, c):
    return itertools.combinations(arr, c)

for i in using_it(arr, c):
    print(i)

throw error: numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend) Unknown attribute 'combinations' of type Module(<module 'itertools' (built-in)>)

After some googling, I found this github issue where the questioner proposed this numba-safe function for calculating permutations:

@jit(nopython=True)
def permutations(A, k):
    r = [[i for i in range(0)]]
    for i in range(k):
        r = [[a] + b for a in A for b in r if (a in b)==False]
    return r

Leveraging that, I can then easily filter down to combinations:

@jit(nopython=True)
def combinations(A, k):
    return [item for item in permutations(A, k) if sorted(item) == item]

Now I can run that combinations function without errors and get the correct result. However, this is now dramatically slower with the @jit(nopython=True) than without it. Running this timing test:

A = list(range(20))  # numba throws 'cannot determine numba type of range' w/o list
k = 2
start = pd.Timestamp.utcnow()
print(combinations(A, k))
print(f"took {pd.Timestamp.utcnow() - start}")

clocks in at 2.6 seconds with the numba @jit(nopython=True) decorators, and under 1/000 of a second with them commented out. So that's not really a workable solution for me either.

like image 537
Max Power Avatar asked Apr 17 '20 00:04

Max Power


People also ask

Is Numba faster than NumPy?

Large data For larger input data, Numba version of function is must faster than Numpy version, even taking into account of the compiling time. In fact, the ratio of the Numpy and Numba run time will depends on both datasize, and the number of loops, or more general the nature of the function (to be compiled).

Is Numba as fast as C?

the name “Numba” comes from “NumPy” + “Mamba” which is a clever wordplay because Mamba snakes are known to crunch their prey quicker and generally faster than Pythons. The machine code generated by Numba is as fast as languages like C, C++, and Fortran without having to code in those languages.

Does Numba work with Dicts?

Numba only supports the use of dict() without any arguments.

Does Numba speed up for loops?

With Numba, you can speed up all of your calculation focused and computationally heavy python functions(eg loops). It also has support for numpy library! So, you can use numpy in your calculations too, and speed up the overall computation as loops in python are very slow.


1 Answers

There is not much to gain with Numba in this case as itertools.combinations is written in C.

If you want to benchmark it, here is a Numba / Python implementation of what itertools.combinatiions does:

@jit(nopython=True)
def using_numba(pool, r):
    n = len(pool)
    indices = list(range(r))
    empty = not(n and (0 < r <= n))

    if not empty:
        result = [pool[i] for i in indices]
        yield result

    while not empty:
        i = r - 1
        while i >= 0 and indices[i] == i + n - r:
            i -= 1
        if i < 0:
            empty = True
        else:
            indices[i] += 1
            for j in range(i+1, r):
                indices[j] = indices[j-1] + 1

            result = [pool[i] for i in indices]
            yield result

On my machine, this is about 15 times slower than itertools.combinations. Getting the permutations and filtering the combinations would certainly be even slower.

like image 183
Jacques Gaudin Avatar answered Oct 28 '22 11:10

Jacques Gaudin