Algorithm to find edit distance to all substrings

Tags:

Given 2 strings s and t. I need to find for each substring in s edit distance(Levenshtein distance) to t. Actually I need to know for each i position in s what is the minimum edit distance for all substrings started at position i.

For example:

t = "ab"    
s = "sdabcb"

And I need to get something like:

{2,1,0,2,2}

Explanation:

1st position:
distance("ab", "sd") = 4 ( 2*subst )
distance("ab", "sda") = 3( 2*delete + insert )
distance("ab", "sdab") = 2 ( 2 * delete)
distance("ab", "sdabc") = 3 ( 3 * delete)
distance("ab", "sdabcb") = 4 ( 4 * delete)
So, minimum is 2

2nd position:
distance("ab", "da") = 2 (delete + insert)
distance("ab", "dab") = 1 (delete)
distance("ab", "dabc") = 2 (2*delete)
....
So, minimum is 1

3th position:
distance("ab", "ab") = 0
...
minimum is 0

and so on.

I can use brute force algorithm to solve this task, of course. But is there faster algorithm?

Thanks for help.

288

asked Nov 15 '11 16:11

Ivan Bianko

1 Answers

To find substrings in a given string is very easy. You take the normal Levenshtein algorithm and modify it slightly.

FIRST: Instead of filling the first row of the matrix with 0,1,2,3,4,5,... you fill it entirely with zeros. (green rectangle)

SECOND: Then you run the algorithm.

THIRD: Instead of returning the last cell of the last row you search for the smallest value in the last row and return it. (red rectangle)

Example: needle: "aba", haystack: "c abba c" --> result = 1 (converting abba -> aba)

enter image description here

I tested it and it works.

This is much faster than your suggestion of stepping character by character through the string as you do in your question. You only create the matrix once.

134

answered Oct 24 '22 22:10

Elmue

Related questions
                            
                                Python 3 How to get string between two points using regex?
                            
                                How to get character array which is inside StringBuilder to avoid array copy
                            
                                Why am I getting "AttributeError: 'module' object has no attribute 'replace'" on string.replace()
                            
                                The length of Arabic letters in Lua
                            
                                Number of objects created during string concatenation
                            
                                How to check a string contains only digits and one occurrence of a decimal point?
                            
                                Theme Dependent Android Strings
                            
                                How to capitalize a string in Python? [duplicate]
                            
                                Elegant way of converting between StringComparison and StringComparer?
                            
                                How to put double quotes into Swift String
                            
                                Protobuf3: String validation with regex
                            
                                Is there a fast algorithm to remove repeated substrings in a string?
                            
                                Java Array Char And String Difference In Array [duplicate]
                            
                                Make a string from an IntStream of code point numbers?
                            
                                str.format(list) with negative index doesn't work in Python
                            
                                What is the best way to achieve sscanf-like functionality in Perl?
                            
                                Replace the first letter of a String in Java?
                            
                                python regular expression replacing part of a matched string
                            
                                Adding to middle of std::vector
                            
                                Convert comma separated string into array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Algorithm to find edit distance to all substrings

Tags:

string

algorithm

levenshtein-distance

similarity

edit-distance

Ivan Bianko

People also ask

1 Answers

Elmue

Recent Activity

Donate For Us