Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Choice of algorithm for .indexOf method in Java

I was just looking at the implementation of the Java String class's .indexOf() method and it seems the author of the code uses the brute force algorithm to find the substring in a given string. That is, the approach runs in O(mn), where m and n are the length of the source and target strings, respectively.

Why didn't the author use a more efficient algorithm like Rabin-Karp, which has a runtime complexity of O(m + n) if a good hash function is provided ?

I might be missing out on the complete knowledge behind the reason for this implementation and hence wanted to understand.

like image 661
name_masked Avatar asked Feb 14 '11 20:02

name_masked


1 Answers

I don't know for sure why this decision was made, but if I uad to guess it's probably because for small pattern strings (a very common use case) the naive brute force algorithm is probably as fast if not faster than some of the asymptotically faster algorithms like Rabin-Karp, Boyer-Moore, or Knuth-Morris-Pratt. This seems like a reasonable default algorithm since in many cases you'll be searching small strings for small patterns, and the overhead from a powerful matched setup probably would be comparable to the runtime of the naive approach.

That said, nowhere in the Java spec does it mandate the use of this algorithm. They could just as easily have picked Rabin-Karp as the default algorithm.

Another reason they may have opted for this approach is because if you want to do fast text searching, the regex library provides faster string matching with more powerful search capabilities. Giving users the simple brute force algorithm by default and the option to switch to a more powerful set of tools when needed seems like a good way to balance asymptotic efficiency with practical efficiency.

like image 95
templatetypedef Avatar answered Oct 15 '22 21:10

templatetypedef