Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - itertools.product without using element more than once

I have a list of lists, and I would like to duplicate the effect of itertools.product() without using any element more than once.

>>> list = [['A', 'B'], ['C', 'D'], ['A', 'B']]
>>> [''.join(e) for e in itertools.product(*list)]
['ACA', 'ACB', 'ADA', 'ADB', 'BCA', 'BCB', 'BDA', 'BDB']
>>> # Desired output: ['ACB', 'ADB', 'BCA', 'BDA']

The list I need to use this on is too large to compute itertools.product and remove the unneeded elements. (25 billion permutations from itertools.product, while my desired output only has ~500,000). Preferably, an answer will be iterable.

Edit: I know "product" is the wrong term for what I need, but I'm struggling to find the word I'm looking for.

Edit2: This is the list that I desire to perform this operation on:

[['A', 'H', 'L'], ['X', 'B', 'I'], ['Q', 'C', 'V'], ['D', 'N'], ['E', 'F'], ['E', 'F'], ['G'], ['A', 'H', 'L'], ['X', 'B', 'I'], ['W', 'U', 'J', 'K', 'M'], ['W', 'U', 'J', 'K', 'M'], ['A', 'H', 'L'], ['W', 'U', 'J', 'K', 'M'], ['D', 'N'], ['P', 'O', 'T'], ['P', 'O', 'T'], ['Q', 'C', 'V'], ['R'], ['S'], ['P', 'O', 'T'], ['W', 'U', 'J', 'K', 'M'], ['Q', 'C', 'V'], ['W', 'U', 'J', 'K', 'M'], ['X', 'B', 'I']]
like image 891
user72528 Avatar asked Mar 02 '18 02:03

user72528


3 Answers

A simple stack-based implementation:

def product1(l): return product1_(l,0,[])
def product1_(l,i,buf):
  if i==len(l): yield buf
  else:
    for x in l[i]:
      if x not in buf:
        buf.append(x)
        yield from product1_(l,i+1,buf)
        buf.pop()

This is a bit slower than Patrick Haugh's answer (I get 18 s for your test case), but it gives the results in a predictable order.

Note that you have to process the values as it generates "them", since they're all the same list buf; you could write yield tuple(buf) or yield "".join(buf) to generate separate "cooked" values (at a cost of less than one additional second).

If the values are letters, you could use a "bitmask" list to implement the collision test, which reduces the time to about 13 s (but using a set is just as fast). Other possible optimizations include processing lists with fewer eligible elements first, to reduce backtracking; this can get this case down to 11 s.

like image 52
Davis Herring Avatar answered Oct 24 '22 23:10

Davis Herring


test1 = [['A', 'B'], ['C', 'D'], ['A', 'B']]                                                                                           
test2 = [['A', 'H', 'L'], ['X', 'B', 'I'], ['Q', 'C', 'V'], ['D', 'N'], ['E', 'F'], ['E', 'F'], ['G'], ['A', 'H', 'L'],
         ['X', 'B', 'I'], ['W', 'U', 'J', 'K', 'M'], ['W', 'U', 'J', 'K', 'M'], ['A', 'H', 'L'],
         ['W', 'U', 'J', 'K', 'M'], ['D', 'N'], ['P', 'O', 'T'], ['P', 'O', 'T'], ['Q', 'C', 'V'], ['R'], ['S'],
         ['P', 'O', 'T'], ['W', 'U', 'J', 'K', 'M'], ['Q', 'C', 'V'], ['W', 'U', 'J', 'K', 'M'], ['X', 'B', 'I']]


def prod(curr, *others):
    if not others:
        for x in curr:
            yield {x} # (x,) for tuples
    else:
        for o in prod(*others):
            for c in curr:
                if c not in o:
                    yield {c, *o} # (c, *o) for tuples

print([''.join(x) for x in prod(*test1)])
# ['ABC', 'ABD', 'ABC', 'ABD'] 
print(sum(1 for x in prod(*test2)))
# 622080

The longer input takes about five seconds to run on my machine. I use sets to pass values around because they are much more efficent than tuples or lists when it comes to membership checks. If necessary, you can use tuples, it will just be slower.

Some questions to think about: does order matter? What do you want to happen when we can't use an item from the current list (because they've all already been used)?

like image 25
Patrick Haugh Avatar answered Oct 24 '22 21:10

Patrick Haugh


Your specific case has an interesting property. If we arrange it in a counter, we see that every list occurs as many times as its entries:

Counter({('A', 'H', 'L'): 3,
         ('D', 'N'): 2,
         ('E', 'F'): 2,
         ('G',): 1,
         ('P', 'O', 'T'): 3,
         ('Q', 'C', 'V'): 3,
         ('R',): 1,
         ('S',): 1,
         ('W', 'U', 'J', 'K', 'M'): 5,
         ('X', 'B', 'I'): 3})

In other words, ignoring order, the sequences you want are the cartesian products of permutations of your lists. Suppose your list is called l. Then we can assemble a list of all the permutations of the sublists and take their product:

c = set(tuple(i) for i in l)
permutations = [list(itertools.permutations(i)) for i in c]
permutation_products = itertools.products(*permutations)

An element of permutation_products looks something like:

(('W', 'U', 'J', 'K', 'M'),
 ('E', 'F'),
 ('X', 'B', 'I'),
 ('Q', 'C', 'V'),
 ('P', 'O', 'T'),
 ('D', 'N'),
 ('G',),
 ('S',),
 ('R',),
 ('A', 'L', 'H'))

We have to get it back into the right order. Suppose our permutation is called perm. For each sublist of your list, we have to locate the correct element of perm and then take the next letter in the permutation. We can do this by making a dictionary:

perm_dict = {frozenset(p): list(p) for p in perm}

Then, to construct a single permutation, we have:

s = "".join([perm_dict[frozenset(i)].pop() for i in l])

We can combine all this into a generator:

def permute_list(l):
    c = set(tuple(i) for i in l)
    permutations = [list(itertools.permutations(i)) for i in c]
    permutation_products = itertools.product(*permutations)
    for perm in permutation_products:
        perm_dict = {frozenset(p): list(p) for p in perm}
        yield "".join([perm_dict[frozenset(i)].pop() for i in l])
like image 45
hoyland Avatar answered Oct 24 '22 23:10

hoyland