Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging sequence of symbols

I have a problem and I need an algorithm to solve it.
I could not find it and I do not know if the problem is NP-Hard.

The problem is: I have several sequences of symbols. I want to merge them into a single sequence, where all symbols of the original sequences are included keeping the original order of the symbols. Duplicated symbols that came from different sequences should be removed. The resulting sequence must be the smallest sequence that is possible.

If one of the sequences is "abc", the resulting sequence must be *a*b*c*, where * is a sequence of 0 or more symbols. If the input sequences are 'abc' and 'cba', the output must be 'abcba', 'c' is included once and the *a*b*c* and *c*b*a* property is kept.

Example:

Input:

abcde
xbcaf
axdaf

A way how it is merged:

a-bcde--
-xbc--af
ax--d-af

Output:

axbcdeaf

Multiple outputs is possible, 'abcd' and 'cba' will result in 'abcdba', 'abcbda' or 'abcbad'. I will need just one output, any output is valid, if its length is the smallest lenght that is possible.

Thank

like image 799
Zeze Avatar asked Jul 25 '12 20:07

Zeze


1 Answers

This problem is called Shortest Common Supersequence and is known to be NP-complete. If you're okay with approximations you can search around and find things like this: An Approximation Algorithm for the Shortest Common Supersequence Problem: An Experimental Analysis, Barone et. al (pdf).

like image 182
mhum Avatar answered Oct 06 '22 00:10

mhum