Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

itertools product should not contain combination having duplicate values

I am trying to create combinations. Sample code is as follow:

a = [1, 2, 3], [1, 4, 5]
combinations = list(itertools.product(*a))

Output:

[(1, 1), (1, 4), (1, 5), (2, 1), (2, 4), (2, 5), (3, 1), (3, 4), (3, 5)]

I don't need combination (1,1). I have already tried following code:

for comb in combinations:
    if comb[0] == comb[1]:
        combinations.remove(comb)

But as I have to execute this on a large data. It's taking too much time.

Also the elements in combination should be equal to the number of items in the list. For example: a = [1,2,3], [2,3,7],[4,5,1] the elements in each combination would be 3, like (1,2,4)

Please suggest a way to avoid such combinations.

like image 373
panr Avatar asked Jul 02 '16 11:07

panr


2 Answers

For two iterables a simple list comprehension would do:

>>> from itertools import product
>>> a = [1, 2, 3], [1, 4, 5]
>>> [(x, y) for x, y in product(*a) if x != y]
[(1, 4), (1, 5), (2, 1), (2, 4), (2, 5), (3, 1), (3, 4), (3, 5)]

If you need to filter a product of arbitrary number of iterables, then it's better to use sets to check that all elements of a combination are distinct:

>>> a = [1, 2, 3], [1, 4, 5], [1, 8, 9]
>>> [p for p in product(*a) if len(set(p)) == len(p)]
[(1, 4, 8), (1, 4, 9), (1, 5, 8), (1, 5, 9), (2, 1, 8), (2, 1, 9), (2, 4, 1), (2, 4, 8), (2, 4, 9), (2, 5, 1), (2, 5, 8), (2, 5, 9), (3, 1, 8), (3, 1, 9), (3, 4, 1), (3, 4, 8), (3, 4, 9), (3, 5, 1), (3, 5, 8), (3, 5, 9)]

BTW, never alter the list you're looping on, because that's quite likely to produce an incorrect loop.

like image 60
Eugene Yarmash Avatar answered Sep 22 '22 15:09

Eugene Yarmash


In case your list of lists can have more than two sublists, you could compare the size of the tuple to the size of the tuple after converting it to a set, thus filtering duplicate elements.

>>> import itertools
>>> b = [1,2,3],[1,4,5],[1,2,6]
>>> [x for x in itertools.product(*b) if len(x) == len(set(x))]
[(1, 4, 2), (1, 4, 6), (1, 5, 2), (1, 5, 6), 
 (2, 1, 6), (2, 4, 1), (2, 4, 6), (2, 5, 1), (2, 5, 6), 
 (3, 1, 2), (3, 1, 6), (3, 4, 1), (3, 4, 2), (3, 4, 6), (3, 5, 1), (3, 5, 2), (3, 5, 6)]

It works fine but takes too long when the number of lists are more than 20. [...] Max count of lists would be 40 & items in each list can go up to 20.

Those are awfully many and awfully large lists. Even for 20 lists with 3 elements each you would have to go through 3^20 = 3,486,784,401 combinations. This can still be feasible, but you should use a generator expression, instead of a list comprehension, i.e. (...) instead of [...]:

gen = (x for x in itertools.product(*b) if len(x) == len(set(x)))
for x in gen:
    # do stuff

For 40 lists with 20 elements each, you get a whopping 10^52 combinations. The universe will likely die before you generate all of those. Assuming that most of those (almost all, in fact) would contain duplicates, you could try a more clever algorithm, skipping entire 'branches' of combinations as soon as you encounter the first duplicate, but I doubt that even those will help much.

like image 27
tobias_k Avatar answered Sep 21 '22 15:09

tobias_k