Suppose we are given a string S, and a list of some other strings L.
How can we know if S is a one of all the possible concatenations of L?
For example:
S = "abcdabce"
L = ["abcd", "a", "bc", "e"]
S is "abcd" + "a" + "bc" + "e", then S is a concatenation of L, whereas "ababcecd" is not.
In order to solve this question, I tried to use DFS/backtracking. The pseudo code is as follows:
boolean isConcatenation(S, L) {
if (L.length == 1 && S == L[0]) return true;
for (String s: L) {
if (S.startwith(s)) {
markAsVisited(s);
if (isConcatnation(S.exclude(s), L.exclude(s)))
return true;
markAsUnvisited(s);
}
}
return false;
}
However, DFS/backtracking is not a efficient solution. I am curious what is the fastest algorithm to solve this question or if there is any other algorithm to solve it in a faster way. I hope there are algorithms like KMP, which can solve it in O(n) time.
In python:
>>> yes = 'abcdabce'
>>> no = 'ababcecd'
>>> L = ['abcd','a','bc','e']
>>> yes in [''.join(p) for p in itertools.permutations(L)]
True
>>> no in [''.join(p) for p in itertools.permutations(L)]
False
edit: as pointed out, this is n! complex, so is inappropriate for large L. But hey, development time under 10 seconds.
You can instead build your own permutation generator, starting with the basic permutator:
def all_perms(elements):
if len(elements) <=1:
yield elements
else:
for perm in all_perms(elements[1:]):
for i in range(len(elements)):
yield perm[:i] + elements[0:1] + perm[i:]
And then discard branches that you don't care about by tracking what the concatenation of the elements would be and only iterating if it adds up to your target string.
def all_perms(elements, conc=''):
...
for perm in all_perms(elements[1:], conc + elements[0]):
...
if target.startswith(''.join(conc)):
...
A dynamic programming approach would be to work left to right, building up an array A[x] where A[x] is true if the first x characters of the string form one of the possible concatenations of L. You can work out A[n] given earlier A[n] by checking each possible string in the list - if the characters of S up to the nth character match a candidate string of length k and if A[n-k] is true, then you can set A[n] true.
I note that you can use https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm to find the matches you need as input to the dynamic program - the matching costs will be linear in the size of the input string, the total size of all candidate strings, and the number of matches between the input string and candidate strings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With