Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to generate permutations of a list of strings and their substrings

This algorithm has been escaping me for some time now. Lets say I'm given the string "cccaatt". I'm trying to generate all the possible variations of each substring of repeated letters. EG, "cccaatt" as an input would return:

cat, catt, caat, caatt, ccat, ccatt, ccaat, ccaatt, cccat, cccatt, cccaat, cccaatt

The order of the results does not matter, so long as it returns all of them. Generally, the input is a string, consisting of g groups of repeated letters, each group k_n letters long.

My intuition is that this is a recursive algorithm, but the exact structure of it has been difficult to understand.

like image 473
Collin Watts Avatar asked Oct 07 '22 17:10

Collin Watts


2 Answers

If you store the alphabet and maximum occurrences of each letter (as awesomely mentioned in the comments) you can do this:

function variations(letter_type, current string) {
    if (letter_type is in the alphabet) {
        while (string has fewer than the max amount of that letter) {
            add one of that letter to current string
            variations(next letter, current string)
        }
    } else {
        print current string // since there are no more letters to add
    }
}

In Java:

public class Variations {

    static String[] alphabet = {"c","a","t"};
    static int[] maximums = {3, 2, 2};

    public static void main(String[] args) {
        variations(0, "");
    }

    public static void variations(int letter_type, String curr) {
        if (letter_type < alphabet.length) {
            for (int i = 1; i <= maximums[letter_type]; i++) {
            curr += alphabet[letter_type];
            variations(letter_type+1, curr);
            }
        } else {
            System.out.println(curr);
        } 
    }

}
like image 98
chm Avatar answered Oct 13 '22 14:10

chm


Decompose the string into a list of numbers and the number of repeats, i.e. "cccaatt" => [(c,3), (a,2), (t,2)]. then the problem could be defined recursively.

Let xs = [(a_1, n_1), (a_2, n_2), (a_3, n_3), ... (a_k, n_k)]
define Perm(xs):
    if len(xs) == 1:
        return all length variations of xs
    else:
        return every sequence in Perm(x[:-1]) appended with one or more from x[-1]

I'll have a python example shortly.

> perm("cccaatt")
> ['cat', 'ccat', 'cccat', 'caat', 'ccaat', 'cccaat', 'catt', 'ccatt', 'cccatt', 'caatt', 'ccaatt', 'cccaatt']

Code attached

def perm(xs):
    if not xs:
        return []

    # group them into the correct format, probably should have used groupby + zip
    l = [(xs[0],1)]
    for x in xs[1:]:
        last, num = l[-1]
        if last == x:
            l[-1] = (last, num+1)
        else:
            l.append((x, 1))

    # print(l)
    print(recurse(l))

# this is where the real work is done.
def recurse(xs):
    if len(xs) == 1:
        return [ xs[0][0] * x for x in range(1, xs[0][1] + 1) ]

    prev = recurse(xs[:-1])
    char, num = xs[-1]
    return [ y + x * char for x in range(1,num + 1) for y in prev ]
like image 26
dpington Avatar answered Oct 13 '22 14:10

dpington