Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is `scipy.misc.comb` faster than an ad-hoc binomial computation?

Is it conclusive that now the scipy.misc.comb is indeed faster than the ad-hoc implementation?

According to an old answer, Statistics: combinations in Python, this homebrew function is faster than scipy.misc.comb when calculating combinations nCr:

def choose(n, k):     """     A fast way to calculate binomial coefficients by Andrew Dalke (contrib).     """     if 0 <= k <= n:         ntok = 1         ktok = 1         for t in xrange(1, min(k, n - k) + 1):             ntok *= n             ktok *= t             n -= 1         return ntok // ktok     else:         return 0 

But after running some tests on my own machine, this doesn't seem like the case, using this script:

from scipy.misc import comb import random, time  def choose(n, k):     """     A fast way to calculate binomial coefficients by Andrew Dalke (contrib).     """     if 0 <= k <= n:         ntok = 1         ktok = 1         for t in xrange(1, min(k, n - k) + 1):             ntok *= n             ktok *= t             n -= 1         return ntok // ktok     else:         return 0  def timing(f):     def wrap(*args):         time1 = time.time()         ret = f(*args)         time2 = time.time()         print '%s function took %0.3f ms' % (f.__name__, (time2-time1)*1000.0)         return ret     return wrap  @timing def test_func(combination_func, nk):     for n,k in nk:         combination_func(n, k)  nk = [] for _ in range(1000):     n = int(random.random() * 10000)     k = random.randint(0,n)     nk.append((n,k))  test_func(comb, nk) test_func(choose, nk) 

I get the following output:

$ python test.py /usr/lib/python2.7/dist-packages/scipy/misc/common.py:295: RuntimeWarning: overflow encountered in exp   vals = exp(lgam(N+1) - lgam(N-k+1) - lgam(k+1)) 999 test_func function took 32.869 ms 999 test_func function took 1859.125 ms  $ python test.py /usr/lib/python2.7/dist-packages/scipy/misc/common.py:295: RuntimeWarning: overflow encountered in exp   vals = exp(lgam(N+1) - lgam(N-k+1) - lgam(k+1)) 999 test_func function took 32.265 ms 999 test_func function took 1878.550 ms 

Did the time profiling test show that the new scipy.misc.comb is faster than the ad-hoc choose() function? Is there any error on my test script that makes the timing inaccurate?

Why is it that the scipy.misc.comb is faster now? It is because of some cython / c wrapping tricks?


EDITED

After @WarrenWeckesser comment:

Using the default floating point approximation when using scipy.misc.comb(), the computation breaks because of floating point overflow.

(see http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.misc.comb.html for documentation)

When tested with exact=True which computes with long integers instead of floating point using the function below, it's a lot slower when computing 1000 combinations:

@timing def test_func(combination_func, nk):     for i, (n,k) in enumerate(nk):         combination_func(n, k, exact=True) 

[out]:

$ python test.py test_func function took 3312.211 ms test_func function took 1764.523 ms  $ python test.py test_func function took 3320.198 ms test_func function took 1782.280 ms 
like image 579
alvas Avatar asked Nov 02 '15 00:11

alvas


1 Answers

Referring to the source code of scipy.misc.comb, the update routine of the result is:

    val = 1     for j in xrange(min(k, N-k)):         val = (val*(N-j))//(j+1)     return val 

whereas the update routine you suggested is:

    ntok = 1     ktok = 1     for t in xrange(1, min(k, n - k) + 1):         ntok *= n         ktok *= t         n -= 1     return ntok // ktok 

My guess of the reason why SciPy's implementation is slower is due to the fact that the subroutine involves an integer division at each iteration while yours only calls division once at the return statement.

like image 88
Victor Avatar answered Sep 20 '22 18:09

Victor