I know that there are fast string search algorithms, like Boyer–Moore and Knuth–Morris–Pratt, that have O(n+m) complexity, while the trivial solution would be O(n*m).
So does the implementation of strstr() for the most popular toolchains - gcc and Visual Studio - use these fast O(n) algorithms, or does it use the trivial solution?
GCC's runtime library uses Two-Way Algorithm
which performs 2n-m text character comparisons in the worst case. It is O(n) complexity in the search phase but it needs a extra preprocessing phase which is O(m) complexity. You could find details on http://www-igm.univ-mlv.fr/~lecroq/string/node26.html about the algorithm.
AFAIK MSVC runtime is doing strstr
in the most naive way, in O(n*m) complexity. But brute force doesn't need extra memory space, so it never raise a bad alloc exception. KMP need O(m) extra space, and Two-Way need a constant extra space.
What GCC is doing sounds just like using FFT to calculate multiplys. Looks extremely fast in paper, but really slow in practice. MSVC will use SIMD instructions in strstr
when they are availiable so it's even faster in most case. I will choose the brute force approach with SIMD if I'm going to write my own library.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With