Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can i determine if a string is a concatenation of a string list

Suppose we are given a string S, and a list of some other strings L.

How can we know if S is a one of all the possible concatenations of L?

For example:

S = "abcdabce"

L = ["abcd", "a", "bc", "e"]

S is "abcd" + "a" + "bc" + "e", then S is a concatenation of L, whereas "ababcecd" is not.

In order to solve this question, I tried to use DFS/backtracking. The pseudo code is as follows:

boolean isConcatenation(S, L) {
    if (L.length == 1 && S == L[0]) return true;
    for (String s: L) {
        if (S.startwith(s)) {
            markAsVisited(s);
            if (isConcatnation(S.exclude(s), L.exclude(s))) 
                return true;
            markAsUnvisited(s);
        }
    }
    return false;
}

However, DFS/backtracking is not a efficient solution. I am curious what is the fastest algorithm to solve this question or if there is any other algorithm to solve it in a faster way. I hope there are algorithms like KMP, which can solve it in O(n) time.

like image 224
Yifei Avatar asked Jul 18 '13 19:07

Yifei


2 Answers

In python:

>>> yes = 'abcdabce'
>>> no = 'ababcecd'
>>> L = ['abcd','a','bc','e']
>>> yes in [''.join(p) for p in itertools.permutations(L)]
True
>>> no in [''.join(p) for p in itertools.permutations(L)]
False

edit: as pointed out, this is n! complex, so is inappropriate for large L. But hey, development time under 10 seconds.

You can instead build your own permutation generator, starting with the basic permutator:

def all_perms(elements):
    if len(elements) <=1:
        yield elements
    else:
        for perm in all_perms(elements[1:]):
            for i in range(len(elements)):
                yield perm[:i] + elements[0:1] + perm[i:]

And then discard branches that you don't care about by tracking what the concatenation of the elements would be and only iterating if it adds up to your target string.

    def all_perms(elements, conc=''):
    ...
            for perm in all_perms(elements[1:], conc + elements[0]):
    ...
                if target.startswith(''.join(conc)):
    ...
like image 174
roippi Avatar answered Sep 28 '22 00:09

roippi


A dynamic programming approach would be to work left to right, building up an array A[x] where A[x] is true if the first x characters of the string form one of the possible concatenations of L. You can work out A[n] given earlier A[n] by checking each possible string in the list - if the characters of S up to the nth character match a candidate string of length k and if A[n-k] is true, then you can set A[n] true.

I note that you can use https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm to find the matches you need as input to the dynamic program - the matching costs will be linear in the size of the input string, the total size of all candidate strings, and the number of matches between the input string and candidate strings.

like image 38
mcdowella Avatar answered Sep 28 '22 01:09

mcdowella