String Tiling Algorithm

Tags:

I'm looking for an efficient algorithm to do string tiling. Basically, you are given a list of strings, say BCD, CDE, ABC, A, and the resulting tiled string should be ABCDE, because BCD aligns with CDE yielding BCDE, which is then aligned with ABC yielding the final ABCDE.

Currently, I'm using a slightly naïve algorithm, that works as follows. Starting with a random pair of strings, say BCD and CDE, I use the following (in Java):

Click to copy

public static String tile(String first, String second) {
  for (int i = 0; i < first.length() || i < second.length(); i++) {
    // "right" tile (e.g., "BCD" and "CDE")
    String firstTile = first.substring(i);
    // "left" tile (e.g., "CDE" and "BCD")  
    String secondTile = second.substring(i);
    if (second.contains(firstTile)) {
      return first.substring(0, i) + second;
    } else if (first.contains(secondTile)) {
      return second.substring(0, i) + first;
    }
  }
  return EMPTY;
}

System.out.println(tile("CDE", "ABCDEF")); // ABCDEF
System.out.println(tile("BCD", "CDE")); // BCDE
System.out.println(tile("CDE", "ABC")); // ABCDE
System.out.println(tile("ABC", tile("BCX", "XYZ"))); // ABCXYZ

Although this works, it's not very efficient, as it iterates over the same characters over and over again.

So, does anybody know a better (more efficient) algorithm to do this ? This problem is similar to a DNA sequence alignment problem, so any advice from someone in this field (and others, of course) are very much welcome. Also note that I'm not looking for an alignment, but a tiling, because I require a full overlap of one of the strings over the other.

I'm currently looking for an adaptation of the Rabin-Karp algorithm, in order to improve the asymptotic complexity of the algorithm, but I'd like to hear some advice before delving any further into this matter.

Thanks in advance.

For situations where there is ambiguity -- e.g., {ABC, CBA} which could result in ABCBA or CBABC --, any tiling can be returned. However, this situation seldom occurs, because I'm tiling words, e.g. {This is, is me} => {This is me}, which are manipulated so that the aforementioned algorithm works.

Similar question: Efficient Algorithm for String Concatenation with Overlap

290

asked Sep 17 '09 20:09

João Silva

1 Answers

Order the strings by the first character, then length (smallest to largest), and then apply the adaptation to KMP found in this question about concatenating overlapping strings.

answered Sep 17 '22 16:09

Daniel C. Sobral

Related questions
                            
                                Dictionary using Red-Black tree - deletion error
                            
                                Algorithm for merging spatially close paths / line segments
                            
                                What does `(i & (i + 1)) - 1` mean? (in Fenwick Trees)
                            
                                Fast calculation of floating 1/N if factorization of very large integer N is known
                            
                                looking for a tuple matching algorithm
                            
                                Partitioning big rectangle to small ones (2D Packing)
                            
                                generating an sequential five digit alphanumerical ID
                            
                                Algorithm to find a repeated number in a list that may contain any number of repeats
                            
                                algorithm to parse string with dictionary
                            
                                Binary GCD Algorithm vs. Euclid's Algorithm on modern computers
                            
                                minimum sum required to make xor of some integers to zero
                            
                                Find pairs in an array such that a%b = k , where k is a given integer
                            
                                Dijkstra's Algorithm: Why is it needed to find minimum-distance element in the queue
                            
                                How can I analyze or improve my niece's simple compression algorithm that is based on Morse code?
                            
                                Python and OpenCV - Improving my lane detection algorithm
                            
                                Quicksort - reason for equals checks
                            
                                Workign with small probabilities, via logs
                            
                                Triangulate a set of points with a concave domain
                            
                                Parse 'ul' and 'ol' tags
                            
                                Efficient string truncation algorithm, sequentially removing equal prefixes and suffixes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

String Tiling Algorithm

Tags:

string

algorithm

tiling

João Silva

People also ask

1 Answers

Daniel C. Sobral

Recent Activity

Donate For Us