Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Produce all possible sequence combination from a list with character limit

My question is exactly the same as this question. I have the array (list) of characters. I would like to get all possible sequence combinations from that list but with the limit of characters (for example: 2 characters as maximum). Further, no single character can be repeated in a permutation line:

chars = ['a', 'b', 'c', 'd']

# output
output = [['a', 'b', 'c', 'd'],
          ['ab', 'c', 'd'],
          ['a', 'bc', 'd'],
          ['a', 'b', 'cd'],
          ['ab', 'cd'],
          ['abc', 'd'], # this one will be exempted
          ['a', 'bcd'],  # this one will be exempted
          ['abcd']]  # this one will be exempted

I know I can check the condition to omit the over-limit characters combinations while generating and building the sequence. But it will add the run time. My purpose is to reduce the existing execution time.

Without the character count limitation, the combinations will be generated like 2^(N-1). If the list is over 15 characters, it will take too long to execute the program. Therefore I would like to reduce the combinations count by character limit.

The priority is performance. I already research and tried for two days without any success.

like image 924
Steve.NayLinAung Avatar asked Oct 16 '22 20:10

Steve.NayLinAung


1 Answers

One way to do it is to iterate over the input list and gradually build up the combinations. In each step, the next character is taken from the input list and added to the previously generated combinations.

from collections import defaultdict

def make_combinations(seq, maxlen):
    # memo is a dict of {length_of_last_word: list_of_combinations}
    memo = defaultdict(list)
    memo[1] = [[seq[0]]]  # put the first character into the memo

    seq_iter = iter(seq)
    next(seq_iter)  # skip the first character
    for char in seq_iter:
        new_memo = defaultdict(list)

        # iterate over the memo and expand it
        for wordlen, combos in memo.items():
            # add the current character as a separate word
            new_memo[1].extend(combo + [char] for combo in combos)

            # if the maximum word length isn't reached yet, add a character to the last word
            if wordlen < maxlen:
                word = combos[0][-1] + char

                new_memo[wordlen+1] = newcombos = []
                for combo in combos:
                    combo[-1] = word  # overwrite the last word with a longer one
                    newcombos.append(combo)

        memo = new_memo

    # flatten the memo into a list and return it
    return [combo for combos in memo.values() for combo in combos]

Output:

[['a', 'b', 'c', 'd'], ['ab', 'c', 'd'], ['a', 'bc', 'd'],
 ['a', 'b', 'cd'], ['ab', 'cd']]

This implementation is slower than the naive itertools.product approach for short inputs:

input: a b c d
maxlen: 2
iterations: 10000

itertools.product: 0.11653625800136069 seconds
make_combinations: 0.16573870600041118 seconds

But it picks up quickly when the input list is longer:

input: a b c d e f g h i j k
maxlen: 2
iterations: 10000

itertools.product: 6.9087735799985240 seconds
make_combinations: 1.2037671390007745 seconds
like image 181
Aran-Fey Avatar answered Nov 03 '22 05:11

Aran-Fey