I would also like to know which algorithm has the worst case complexity of all for finding all occurrences of a string in another. Seems like Boyer–Moore's algorithm has a linear time complexity.

The KMP algorithm has linear complexity for finding all occurrences of a pattern in a string, like the Boyer-Moore algorithm¹. If you try to find a pattern like "aaaaaa" in a string like "aaaaaaaaa", once you have the first complete match, <pre class="prettyprint"><code>aaaaaaaaa aaaaaa aaaaaa ^ </code></pre> the border table contains the information that the next longest possible match (corresponding to the widest border of the pattern) of a prefix of the pattern is just one character short (a complete match is equivalent to a mismatch one past the end of the pattern in this respect). Thus the pattern is moved one place further, and since from the border table it is known that all characters of the pattern except possibly the last match, the next comparison is between the last pattern character and the aligned text character. In this particular case (find occurrences of am in an), which is the worst case for the naive matching algorithm, the KMP algorithm compares each text character exactly once. In each step, at least one of <ul> <li>the position of the text character compared</li> <li>the position of the first character of the pattern with respect to the text</li> </ul> increases, and neither ever decreases. The position of the text character compared can increase at most <code>length(text)-1</code> times, the position of the first pattern character can increase at most <code>length(text) - length(pattern)</code> times, so the algorithm takes at most <code>2*length(text) - length(pattern) - 1</code> steps. The preprocessing (construction of the border table) takes at most <code>2*length(pattern)</code> steps, thus the overall complexity is O(m+n) and no more <code>m + 2*n</code> steps are executed if <code>m</code> is the length of the pattern and <code>n</code> the length of the text. ¹ Note that the Boyer-Moore algorithm as commonly presented has a worst-case complexity of O(m*n) for periodic patterns and texts like am and an if all matches are required, because after a complete match, <pre class="prettyprint"><code>aaaaaaaaa aaaaaa aaaaaa ^ <- <- ^ </code></pre> the entire pattern would be re-compared. To avoid that, you need to remember how long a prefix of the pattern still matches after the shift following a complete match and only compare the new characters.

What's the worst case complexity for KMP when the goal is to find all occurrences of a certain string?

1 Answers

The KMP algorithm has linear complexity for finding all occurrences of a pattern in a string, like the Boyer-Moore algorithm¹. If you try to find a pattern like "aaaaaa" in a string like "aaaaaaaaa", once you have the first complete match,

aaaaaaaaa
aaaaaa
 aaaaaa
      ^

the border table contains the information that the next longest possible match (corresponding to the widest border of the pattern) of a prefix of the pattern is just one character short (a complete match is equivalent to a mismatch one past the end of the pattern in this respect). Thus the pattern is moved one place further, and since from the border table it is known that all characters of the pattern except possibly the last match, the next comparison is between the last pattern character and the aligned text character. In this particular case (find occurrences of a^m in aⁿ), which is the worst case for the naive matching algorithm, the KMP algorithm compares each text character exactly once.

In each step, at least one of

the position of the text character compared
the position of the first character of the pattern with respect to the text

increases, and neither ever decreases. The position of the text character compared can increase at most length(text)-1 times, the position of the first pattern character can increase at most length(text) - length(pattern) times, so the algorithm takes at most 2*length(text) - length(pattern) - 1 steps.

The preprocessing (construction of the border table) takes at most 2*length(pattern) steps, thus the overall complexity is O(m+n) and no more m + 2*n steps are executed if m is the length of the pattern and n the length of the text.

¹ Note that the Boyer-Moore algorithm as commonly presented has a worst-case complexity of O(m*n) for periodic patterns and texts like a^m and aⁿ if all matches are required, because after a complete match,

aaaaaaaaa
aaaaaa
 aaaaaa
      ^
  <- <-
 ^

the entire pattern would be re-compared. To avoid that, you need to remember how long a prefix of the pattern still matches after the shift following a complete match and only compare the new characters.

104

answered Sep 28 '22 07:09

Daniel Fischer

Related questions
                            
                                Split a string using string.gmatch() in Lua
                            
                                Angular http get returns string in extra quotes
                            
                                Passing strings by reference and value in C++
                            
                                How to do string concatenation in Scala
                            
                                Lowercasing script in Python vs Perl
                            
                                Regex - match a string without leading and trailing spaces
                            
                                How much does Java optimize string concatenation with +?
                            
                                Very fast hash function for hashing 8-16 byte strings
                            
                                Using `std::search` over `string::find`
                            
                                Ruby #split("") vs #chars on a string
                            
                                sed print Nth character
                            
                                C++ - Implementing my own stream
                            
                                How to intelligently & safely convert a Double to String?
                            
                                C++: Read from text file and separate into variable
                            
                                How can I use C# to sort values numerically?
                            
                                Removing substring from a string?
                            
                                C# Convert dynamic string to existing Class [duplicate]
                            
                                Python Unicode object and C API ( retrieving char* from pyunicode objects )
                            
                                What is the shortest way in .NET to sort strings starting with 1, 10 and 2 and respect the number ordering?
                            
                                String.Replace(char, char) or Replace(string, string)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the worst case complexity for KMP when the goal is to find all occurrences of a certain string?

Tags:

string

algorithm

time-complexity

knuth-morris-pratt

Ouais Alsharif

People also ask

1 Answers

Daniel Fischer

Recent Activity

Donate For Us