Why does String.indexOf() not use KMP?

Tags:

I read the source code of java.lang.String and I was surprised to find that String.indexof() does not use the Knuth–Morris–Pratt algorithm? As we know, KMP is more effective. So why isn't it used in String.indexOf()?

Someone around me told me that for short string KMP is good enough, but if you need performance and you intend to use with large strings then is not a good choice. However he didn't tell me the details.

So, here are my questions:

why don't we use KMP in String.indexOf()?
why is KMP not a good choice with large Strings?

839

asked Oct 23 '13 13:10

D0n9X1n

2 Answers

KMP has better worst-case performance, but actually requires a little bit of up-front computation (to generate the table of offsets). It also requires an initial memory allocation, which could also impact performance.

For (presumably) common use-cases of searching in relatively short strings, this might actually end up slower than the primitive implementation.

This, bundled with the fact that for really huge data sets you will probably be using more specialized data structures than a simple String means that the increased implementation (and possibly runtime) cost is not worth investing.

Note that this might change in future Java versions, as the actual algorithm is not specified.

answered Sep 28 '22 17:09

Joachim Sauer

KMP and several other asymptotically efficient string search methods like Boyer-Moore and Boyer-Moore-Horspool require extra memory -- in the case of KMP, O(m) memory, where m is the size of the substring being searched for. Although this is often acceptable, library designers have to make tradeoffs so that their code performs acceptably well in many different situations. Probably the main reason is that due to both the preprocessing required by KMP, and its more complex inner loop in the search phase, the constant factor slowdown may make it several times slower than the naive O(mn) substring search in many common cases (e.g. searching for a substring of < 10 characters in a long string). Also, someone searching for a large substring might be perplexed to find the runtime library running out of memory as it tries to allocate a large memory buffer for the KMP fallback function table.

Perhaps a better question is why O(m+n)-time, O(1)-space algorithms like the Two-Way Algorithm have not yet been adopted by mainstream language runtime libraries. Again, the answer is likely to be the constant-factor slowdown in common cases. Nevertheless in at least one C runtime library implementation, the corresponding strstr() function has been updated to use this algorithm.

Someone around me told me that for short string KMP is good enough, but if you need performance and you intend to use with large string then is not a good choice.

Well, that's exactly backwards from my understanding, which is that the naive O(mn) substring search is good enough (and probably the best) for short strings, but will eventually lose out to asymptotically faster O(m+n) algorithms like KMP as the strings become longer.

answered Sep 28 '22 17:09

j_random_hacker

Related questions
                            
                                Exception is never thrown in body of corresponding try statement
                            
                                Can java finalize an object when it is still in scope?
                            
                                How to implement a Spring Data repository for a @MappedSuperclass
                            
                                How specify the required Java version in a Gradle build
                            
                                what is ivy? and how it is related to ant?
                            
                                Logging response body (HTML) from HttpServletResponse using Spring MVC HandlerInterceptorAdapter
                            
                                Can OSGi help reduce complexity?
                            
                                How to store and load keys using java.security.KeyStore class
                            
                                Can C# style object initialization be used in Java?
                            
                                Can I override the Host header where using java's HttpUrlConnection class?
                            
                                Requiring at least one element in java variable argument list
                            
                                Keep inner interface method names in proguard
                            
                                Mockito - thenReturn always returns null object
                            
                                How to get URL from Firebase Storage getDownloadURL
                            
                                Authorization Bearer token in HttpClient?
                            
                                Is there a way to determine if a method has been overridden in a Java class
                            
                                What is the difference between JDK_JAVA_OPTIONS and JAVA_TOOL_OPTIONS when using Java 11?
                            
                                How can I prevent PermGen space errors in Netbeans?
                            
                                Eclipse debugger - jump to or show only suspended thread
                            
                                Get detail messages of chained exceptions Java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does String.indexOf() not use KMP?

Tags:

java

string

knuth-morris-pratt

D0n9X1n

People also ask

2 Answers

Joachim Sauer

j_random_hacker

Recent Activity

Donate For Us