I need to implement a way to search substring (needles) in a list of string (haystack) using Java.
More specifically, my app has a list of user profiles. If I type some letters, for example, "Ja", and then search, then all the users whose name contains "ja" should show up. For instance, the result could be "Jack", "Jackson", "Jason", "Dijafu".
In Java, as I know, there are 3 build-in method to see search substring in a string.
string.contains()
string.indexOf()
regular expression. it is something like string.matches("ja"))
My question is: What are the runtimes of each method above? which one is the fastest or most efficient or most popular way to check if the list of string contains a given substring.
I know there exists some algorithms that do the same thing, such as Boyer–Moore string search algorithm, Knuth–Morris–Pratt algorithm and so on. I do not want to use them because I just have a small list of strings, and I think using them is kind of overkill for me right now. Also I have to type a lot of extra coding for such a non-build-in algorithm. If you think my thoughts is not correct, please feel free to correct me.
The Aho-Corasick string searching algorithm simultaneously finds all occurrences of multiple patterns in one pass through the text. On the other hand, the Boyer-Moore algorithm is understood to be the fastest algorithm for a single pattern.
substring() is blazingly fast. The only thing that would be faster still is to avoid object creation completely by keeping track of the substring start and end positions in variables.
To locate a substring in a string, use the indexOf() method.
String. contains is not always faster than regex. For normal cases, contains is faster.
The accepted answer is not correct and not complete.
indexOf()
does a naive string search using backtracking on mismatches. This is quite fast on small patterns/texts but shows very poor performance on large textscontains("ja")
should be comparable to indexOf (because it delegates to it)matches("ja")
will not deliver the correct result, because it searches for an exact match (only the string "ja"
will match exactly)Pattern p = Pattern.compile("ja"); Matcher m = p.matcher("jack"); m.find();
would be the correct way to find texts with regular expressions. In practice (using large texts) it will be the most efficient way using only the java api. This is because a constant pattern (like "ja"
) will not be processed by the regex engine (which is slow) but by an Boyer-Moore-Algorithm (which is fast)If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With