I have some code which loops through a large set of itertools.combinations
,
which is now a performance bottleneck. I'm trying to turn to numba
's @jit(nopython=True)
to speed it up, but I'm running into some issues.
First, it seems numba can't handle itertools.combinations
itself, per this small example:
import itertools
import numpy as np
from numba import jit
arr = [1, 2, 3]
c = 2
@jit(nopython=True)
def using_it(arr, c):
return itertools.combinations(arr, c)
for i in using_it(arr, c):
print(i)
throw error: numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Unknown attribute 'combinations' of type Module(<module 'itertools' (built-in)>)
After some googling, I found this github issue where the questioner proposed this numba-safe function for calculating permutations:
@jit(nopython=True)
def permutations(A, k):
r = [[i for i in range(0)]]
for i in range(k):
r = [[a] + b for a in A for b in r if (a in b)==False]
return r
Leveraging that, I can then easily filter down to combinations:
@jit(nopython=True)
def combinations(A, k):
return [item for item in permutations(A, k) if sorted(item) == item]
Now I can run that combinations
function without errors and get the correct result. However, this is now dramatically slower with the @jit(nopython=True)
than without it. Running this timing test:
A = list(range(20)) # numba throws 'cannot determine numba type of range' w/o list
k = 2
start = pd.Timestamp.utcnow()
print(combinations(A, k))
print(f"took {pd.Timestamp.utcnow() - start}")
clocks in at 2.6 seconds with the numba @jit(nopython=True)
decorators, and under 1/000 of a second with them commented out. So that's not really a workable solution for me either.
Large data For larger input data, Numba version of function is must faster than Numpy version, even taking into account of the compiling time. In fact, the ratio of the Numpy and Numba run time will depends on both datasize, and the number of loops, or more general the nature of the function (to be compiled).
the name “Numba” comes from “NumPy” + “Mamba” which is a clever wordplay because Mamba snakes are known to crunch their prey quicker and generally faster than Pythons. The machine code generated by Numba is as fast as languages like C, C++, and Fortran without having to code in those languages.
Numba only supports the use of dict() without any arguments.
With Numba, you can speed up all of your calculation focused and computationally heavy python functions(eg loops). It also has support for numpy library! So, you can use numpy in your calculations too, and speed up the overall computation as loops in python are very slow.
There is not much to gain with Numba in this case as itertools.combinations
is written in C.
If you want to benchmark it, here is a Numba / Python implementation of what itertools.combinatiions
does:
@jit(nopython=True)
def using_numba(pool, r):
n = len(pool)
indices = list(range(r))
empty = not(n and (0 < r <= n))
if not empty:
result = [pool[i] for i in indices]
yield result
while not empty:
i = r - 1
while i >= 0 and indices[i] == i + n - r:
i -= 1
if i < 0:
empty = True
else:
indices[i] += 1
for j in range(i+1, r):
indices[j] = indices[j-1] + 1
result = [pool[i] for i in indices]
yield result
On my machine, this is about 15 times slower than itertools.combinations
. Getting the permutations and filtering the combinations would certainly be even slower.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With