String pattern matching with one or zero mismatch

Tags:

Given a string and a pattern to be matched, how efficiently can the matches be found having zero or one mismatch.

e.g) 
S = abbbaaabbbabab
P = abab

Matches are abbb(index 0),aaab(index 4),abbb(index 6),abab(index 10)

I tried to modify KMP algorithm but I'm not sure about the approach.

Please give me idea to proceed with the problem.

Thanks.

793

asked Apr 12 '12 09:04

Anantha Krishnan

1 Answers

Ok I found it! I found the best algorithm!

This might sound a bit brave, but as long as the algorithm I am going to propose has both running time O(m + n) and memory consumption O(m + n) and the entry data itself has the same properties the algorithm can be optimized only in constant.

Algorithms used

I am going to use mix-up between KMP and Rabin Karp algorithms for my solution. Rabin Karp uses rolling hashes for comparing substrings of the initial strings. It requires linear in time precomputing that uses linear additional memory, but from then on the comparison between substrings of the two strings is constant O(1) (this is amortized if you handle collisions properly).

What my solution will not do

My solution will not find all the occurrences in the first string that match the second string with at most 1 difference. However, the algorithm can be modified so that for every starting index in the first string if there is such matching at least one of them will be found (this is left to the reader).

Observations

Let m be the length of the second string and n - the length of the first string. I am going to split the task in two parts: if I am aiming to find a matching with at most one difference, I want to find to substrings of the first string: PREF is going to be the substring before the single difference and SUFF the substring after the difference. I want len(PREF) + len(SUFF) + 1 = m, where PREF or SUFF will be artificially shortened if required (when the strings match without difference).

I am going to base my solution on one very important observation: suppose there is a substring of the first string starting at index i with length m that matches the second string with at most one difference. Then if we take PREF as long as possible there will still be solution for SUFF. This is obvious: I am just pushing the difference as much to the end as possible.

The algorithm

And now follows the algorithm itself. Start off with usual KMP. Every time when the extension of the prefix fails and the fail links are to be followed, first check whether if you skip the next letter the remaining suffix will match the remaining of the second string. If so the sought match with at most one character difference is found. If not - we go on with the ordinary KMP making the Rabin Karp check every time a fail link is to be followed.

Let me clarify further the Rabin Karp check with an example. Suppose we are at certain step of the KMP and we have found that first.substring[i, i + k - 1] matches the first k letters of the second string. Suppose also that the letter first[i + k] is different from second[k]. Then you check whether first.substring[i + k + 1, i + m - 1] matches exactly second.substring[k + 1, m - 1] using Rabin Karp. This is exactly the case in which you have extended the starting prefix form index i as much as possible and you try now whether there is a match with at most one difference.

Rabin Karp will be used only when a fail link is followed, which moves the starting index of the prefix with at least one, which means that at most O(n) Rabin Karp calls are used, every one with complexity O(1) for a total of linear complexity.

155

answered Sep 29 '22 12:09

Boris Strandjev

Related questions
                            
                                What are the differences between a *string and a string in Golang?
                            
                                Javascript \x escaping
                            
                                Replace string in PDF file using Itext but letter X not replace
                            
                                How to find the first character of a String without using any API method
                            
                                How to use java 8 merge function for n number of hashmaps
                            
                                How do I stop Pandas Dataframe read_json method convert my epoch to human readable string
                            
                                Cannot remove a new line from text
                            
                                Elixir: Convert integer to unicode character
                            
                                How to split a compound word split by hyphen into two individual words
                            
                                How do I strip bad chars from a string in JS?
                            
                                String manipulation in Python docstrings
                            
                                Arduino String Formatting Issue
                            
                                Perl strings internals
                            
                                how do convert string to byte[] in C#
                            
                                PHP - sorting an array of filenames with numbers?
                            
                                VB6 - Is there any performance benefit gained by using fixed-width strings in VB6?
                            
                                How do I implement a string comparison in Java that takes the same amount of time no matter whether they match or where a mismatch (if any) occurs?
                            
                                How better check Request.QueryString string parameter for null?
                            
                                Execute Java code in Java [duplicate]
                            
                                How to remove any trailing numbers from a string?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

String pattern matching with one or zero mismatch

Tags:

string

string-matching

algorithm

pattern-matching

knuth-morris-pratt