I have the following code to generate all possible combinations in specified range using itertools but I cant get any speed improvements from using the code with cython. Original code is this: <pre class="prettyprint"><code>from itertools import * def x(e,f,g): a=[] for c in combinations(range(e, f),g): d = list((c)) a.append(d) </code></pre> and after declaring types for cython: <pre class="prettyprint"><code>from itertools import * cpdef x(int e,int f,int g): cpdef tuple c cpdef list a cpdef list d a=[] for c in combinations(range(e, f),g): d = list((c)) a.append(d) </code></pre> I saved the latter as <code>test_cy.pyx</code> and compiled using <code>cythonize -a -i test_cy.pyx</code> After compiling, I created a new script with the following code and ran it: <pre class="prettyprint"><code>import test_cy test_cy.x(1,45,6) </code></pre> I didnt get any significant speed improvement, still took about the same time as the original script, about 10.8 sec. Is there anything I did wrong or is itertools already so optimised that there cant be any bigger improvements to its speed?

As already pointed out in the comments, you should not expect cython to speed-up your code because the most of the time the algorithm spends in itertools and creation of lists. Because I'm curios to see how <code>itertools</code>'s generic implementation fares against old-school-tricks, let's take a look at this Cython implementation of "all subsets k out of n": <pre class="prettyprint"><code>%%cython ctypedef unsigned long long ull cdef ull next_subset(ull subset): cdef ull smallest, ripple, ones smallest = subset& -subset ripple = subset + smallest ones = subset ^ ripple ones = (ones >> 2)//smallest subset= ripple | ones return subset cdef subset2list(ull subset, int offset, int cnt): cdef list lst=[0]*cnt #pre-allocate cdef int current=0; cdef int index=0 while subset>0: if((subset&1)!=0): lst[index]=offset+current index+=1 subset>>=1 current+=1 return lst def all_k_subsets(int start, int end, int k): cdef int n=end-start cdef ull MAX=1L<<n; cdef ull subset=(1L<<k)-1L; lst=[] while(MAX>subset): lst.append(subset2list(subset, start, k)) subset=next_subset(subset) return lst </code></pre> This implementation uses some well-known bit-tricks and has the limitation, that it only works for at most 64 elements. If we compare both approaches: <pre class="prettyprint"><code>>>> %timeit x(1,45,6) 2.52 s ± 108 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) >>> %timeit all_k_subsets(1,45,6) 1.29 s ± 5.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) </code></pre> The speed-up of factor 2 is quite disappointing. However, the bottle-neck is the creation of the lists and not the calculation itself - it is easy to check, that without list creation the calculation would take about 0.1 seconds. My take away from it: if you are serious about speed you should not create so many lists but proceed the subset on the fly (best in cython) - a speed-up of more than 10 is possible. If it is a must to have all subsets as lists, so you cannot expect a huge speed-up.

Speeding up itertools combinations with cython

I have the following code to generate all possible combinations in specified range using itertools but I cant get any speed improvements from using the code with cython. Original code is this:

from itertools import *

def x(e,f,g):
    a=[]        
    for c in combinations(range(e, f),g):
        d = list((c))
        a.append(d)

and after declaring types for cython:

from itertools import *

cpdef x(int e,int f,int g):        
    cpdef tuple c
    cpdef list a
    cpdef list d

    a=[]        
    for c in combinations(range(e, f),g):    
        d = list((c))
        a.append(d)

I saved the latter as test_cy.pyx and compiled using cythonize -a -i test_cy.pyx

After compiling, I created a new script with the following code and ran it:

import test_cy

test_cy.x(1,45,6)

I didnt get any significant speed improvement, still took about the same time as the original script, about 10.8 sec.

Is there anything I did wrong or is itertools already so optimised that there cant be any bigger improvements to its speed?

Is Itertools combinations fast?

Itertools in Python is a module that produces complex iterators with the help of methods that work on iterators. This module works as a fast, memory-efficient tool that is used either by itself or in combination to form iterator algebra.

What does Itertools Zip_longest return?

itertools. zip_longest (*iterables, fillvalue=None) Make an iterator that aggregates elements from each of the iterables. If the iterables are of uneven length, missing values are filled-in with fillvalue. Iteration continues until the longest iterable is exhausted.

What does Itertools combination return?

What does itertools. combinations() do ? It returns r length subsequences of elements from the input iterable.

As already pointed out in the comments, you should not expect cython to speed-up your code because the most of the time the algorithm spends in itertools and creation of lists.

Because I'm curios to see how itertools's generic implementation fares against old-school-tricks, let's take a look at this Cython implementation of "all subsets k out of n":

%%cython

ctypedef unsigned long long ull

cdef ull next_subset(ull subset):
   cdef ull smallest, ripple, ones
   smallest = subset& -subset      
   ripple = subset + smallest    
   ones = subset ^ ripple    
   ones = (ones >> 2)//smallest 
   subset= ripple | ones    
   return subset


cdef subset2list(ull subset, int offset, int cnt):  
    cdef list lst=[0]*cnt #pre-allocate
    cdef int current=0;
    cdef int index=0
    while subset>0:
        if((subset&1)!=0):
            lst[index]=offset+current
            index+=1
        subset>>=1
        current+=1
    return lst

def all_k_subsets(int start, int end, int k):
    cdef int n=end-start      
    cdef ull MAX=1L<<n;
    cdef ull subset=(1L<<k)-1L;
    lst=[]
    while(MAX>subset):
         lst.append(subset2list(subset, start, k))
         subset=next_subset(subset)
    return lst

This implementation uses some well-known bit-tricks and has the limitation, that it only works for at most 64 elements.

If we compare both approaches:

>>> %timeit x(1,45,6)
2.52 s ± 108 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit all_k_subsets(1,45,6)
1.29 s ± 5.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The speed-up of factor 2 is quite disappointing.

However, the bottle-neck is the creation of the lists and not the calculation itself - it is easy to check, that without list creation the calculation would take about 0.1 seconds.

My take away from it: if you are serious about speed you should not create so many lists but proceed the subset on the fly (best in cython) - a speed-up of more than 10 is possible. If it is a must to have all subsets as lists, so you cannot expect a huge speed-up.

Speeding up itertools combinations with cython

Tags:

python

itertools

cython

West

People also ask

1 Answers

ead

Recent Activity

Donate For Us

Speeding up itertools combinations with cython

Tags:

python

itertools

cython

West

People also ask

1 Answers

ead

Related questions

Recent Activity

Donate For Us