Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to speed up calculation of length of longest common substring?

I have two very large strings and I am trying to find out their Longest Common Substring.

One way is using suffix trees (supposed to have a very good complexity, though a complex implementation), and the another is the dynamic programming method (both are mentioned on the Wikipedia page linked above).

Using dynamic programming alt text

The problem is that the dynamic programming method has a huge running time (complexity is O(n*m), where n and m are lengths of the two strings).

What I want to know (before jumping to implement suffix trees): Is it possible to speed up the algorithm if I only want to know the length of the common substring (and not the common substring itself)?

like image 721
Lazer Avatar asked Apr 25 '10 21:04

Lazer


People also ask

What is the time complexity for finding the longest substring that is common?

Let m and n be the lengths of the first and second strings respectively. Dynamic Programming can be used to find the longest common substring in O(m*n) time.

How do you find the longest common substring of multiple strings?

The longest common substrings of a set of strings can be found by building a generalized suffix tree for the strings, and then finding the deepest internal nodes which have leaf nodes from all the strings in the subtree below it.


1 Answers

These will make it run faster, though it'll still be O(nm).

One optimization is in space (which might save you a little allocation time) is noticing that LCSuff only depends on the previous row -- therefore if you only care about the length, you can optimize the O(nm) space down to O(min(n,m)).

The idea is to keep only two rows -- the current row that you are processing, and the previous row that you just processed, and throw away the rest.

like image 181
Larry Avatar answered Oct 23 '22 05:10

Larry