Is it conclusive that now the scipy.misc.comb
is indeed faster than the ad-hoc implementation?
According to an old answer, Statistics: combinations in Python, this homebrew function is faster than scipy.misc.comb
when calculating combinations nCr
:
def choose(n, k): """ A fast way to calculate binomial coefficients by Andrew Dalke (contrib). """ if 0 <= k <= n: ntok = 1 ktok = 1 for t in xrange(1, min(k, n - k) + 1): ntok *= n ktok *= t n -= 1 return ntok // ktok else: return 0
But after running some tests on my own machine, this doesn't seem like the case, using this script:
from scipy.misc import comb import random, time def choose(n, k): """ A fast way to calculate binomial coefficients by Andrew Dalke (contrib). """ if 0 <= k <= n: ntok = 1 ktok = 1 for t in xrange(1, min(k, n - k) + 1): ntok *= n ktok *= t n -= 1 return ntok // ktok else: return 0 def timing(f): def wrap(*args): time1 = time.time() ret = f(*args) time2 = time.time() print '%s function took %0.3f ms' % (f.__name__, (time2-time1)*1000.0) return ret return wrap @timing def test_func(combination_func, nk): for n,k in nk: combination_func(n, k) nk = [] for _ in range(1000): n = int(random.random() * 10000) k = random.randint(0,n) nk.append((n,k)) test_func(comb, nk) test_func(choose, nk)
I get the following output:
$ python test.py /usr/lib/python2.7/dist-packages/scipy/misc/common.py:295: RuntimeWarning: overflow encountered in exp vals = exp(lgam(N+1) - lgam(N-k+1) - lgam(k+1)) 999 test_func function took 32.869 ms 999 test_func function took 1859.125 ms $ python test.py /usr/lib/python2.7/dist-packages/scipy/misc/common.py:295: RuntimeWarning: overflow encountered in exp vals = exp(lgam(N+1) - lgam(N-k+1) - lgam(k+1)) 999 test_func function took 32.265 ms 999 test_func function took 1878.550 ms
Did the time profiling test show that the new scipy.misc.comb
is faster than the ad-hoc choose()
function? Is there any error on my test script that makes the timing inaccurate?
Why is it that the scipy.misc.comb
is faster now? It is because of some cython
/ c
wrapping tricks?
After @WarrenWeckesser comment:
Using the default floating point approximation when using scipy.misc.comb()
, the computation breaks because of floating point overflow.
(see http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.misc.comb.html for documentation)
When tested with exact=True
which computes with long integers instead of floating point using the function below, it's a lot slower when computing 1000 combinations:
@timing def test_func(combination_func, nk): for i, (n,k) in enumerate(nk): combination_func(n, k, exact=True)
[out]:
$ python test.py test_func function took 3312.211 ms test_func function took 1764.523 ms $ python test.py test_func function took 3320.198 ms test_func function took 1782.280 ms
Referring to the source code of scipy.misc.comb, the update routine of the result is:
val = 1 for j in xrange(min(k, N-k)): val = (val*(N-j))//(j+1) return val
whereas the update routine you suggested is:
ntok = 1 ktok = 1 for t in xrange(1, min(k, n - k) + 1): ntok *= n ktok *= t n -= 1 return ntok // ktok
My guess of the reason why SciPy's implementation is slower is due to the fact that the subroutine involves an integer division at each iteration while yours only calls division once at the return statement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With