numba-safe version of itertools.combinations?

Tags:

I have some code which loops through a large set of itertools.combinations, which is now a performance bottleneck. I'm trying to turn to numba's @jit(nopython=True) to speed it up, but I'm running into some issues.

First, it seems numba can't handle itertools.combinations itself, per this small example:

import itertools
import numpy as np
from numba import jit

arr = [1, 2, 3]
c = 2

@jit(nopython=True)
def using_it(arr, c):
    return itertools.combinations(arr, c)

for i in using_it(arr, c):
    print(i)

throw error: numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend) Unknown attribute 'combinations' of type Module(<module 'itertools' (built-in)>)

After some googling, I found this github issue where the questioner proposed this numba-safe function for calculating permutations:

@jit(nopython=True)
def permutations(A, k):
    r = [[i for i in range(0)]]
    for i in range(k):
        r = [[a] + b for a in A for b in r if (a in b)==False]
    return r

Leveraging that, I can then easily filter down to combinations:

@jit(nopython=True)
def combinations(A, k):
    return [item for item in permutations(A, k) if sorted(item) == item]

Now I can run that combinations function without errors and get the correct result. However, this is now dramatically slower with the @jit(nopython=True) than without it. Running this timing test:

A = list(range(20))  # numba throws 'cannot determine numba type of range' w/o list
k = 2
start = pd.Timestamp.utcnow()
print(combinations(A, k))
print(f"took {pd.Timestamp.utcnow() - start}")

clocks in at 2.6 seconds with the numba @jit(nopython=True) decorators, and under 1/000 of a second with them commented out. So that's not really a workable solution for me either.

537

asked Apr 17 '20 00:04

Max Power

1 Answers

There is not much to gain with Numba in this case as itertools.combinations is written in C.

If you want to benchmark it, here is a Numba / Python implementation of what itertools.combinatiions does:

@jit(nopython=True)
def using_numba(pool, r):
    n = len(pool)
    indices = list(range(r))
    empty = not(n and (0 < r <= n))

    if not empty:
        result = [pool[i] for i in indices]
        yield result

    while not empty:
        i = r - 1
        while i >= 0 and indices[i] == i + n - r:
            i -= 1
        if i < 0:
            empty = True
        else:
            indices[i] += 1
            for j in range(i+1, r):
                indices[j] = indices[j-1] + 1

            result = [pool[i] for i in indices]
            yield result

On my machine, this is about 15 times slower than itertools.combinations. Getting the permutations and filtering the combinations would certainly be even slower.

183

answered Oct 28 '22 11:10

Jacques Gaudin

Related questions
                            
                                AttributeError: 'PosixPath' object has no attribute 'path'
                            
                                Keras custom metric sum is wrong
                            
                                What are some reasons Bayesian Optimization might not work for a CNN
                            
                                sklearn.preprocessing.OneHotEncoder: using drop and handle_unknown='ignore'
                            
                                How can I define custom output types for mutations with graphene-django?
                            
                                Python: Changing precedence of import file types (.py before .so)
                            
                                Find the highest value in the Matrix to maximize the score
                            
                                Python Pydantic - how to have an "optional" field but if present required to conform to not None value?
                            
                                How do you unit test Google Cloud NDB code?
                            
                                Adding to PYTHONPATH in VS Code
                            
                                Dataframe cell to be locked and used for a running balance calculation conditional of result on another cell on same row
                            
                                How can I customize Repl.it to not use poetry?
                            
                                Drawing labels that follow their edges in a Networkx graph
                            
                                How to perform a constrained optimization over a scaled regression model?
                            
                                Pycryptodome RSA decryption causes massive performance downgrade (RPS)
                            
                                Selenium login test doesn't accept pytest fixtures for login or refuses to connect
                            
                                How can I trim / remove part of a Tensor to match the shape of another Tensor with PyTorch?
                            
                                ansible not install perfectly using "brew install ansible" command not work in MacOS ? error: -sh: /usr/local/bin/ansible: No such file or directory
                            
                                Silencing SQLAlchemy warnings
                            
                                How is polymophism working in Python if parent constructor is not invoked (unlike Java)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

numba-safe version of itertools.combinations?

Tags:

python

combinations

itertools

numba

Max Power

People also ask

1 Answers

Jacques Gaudin

Recent Activity

Donate For Us