I need to implement a way to search substring (needles) in a list of string (haystack) using Java. More specifically, my app has a list of user profiles. If I type some letters, for example, "Ja", and then search, then all the users whose name contains "ja" should show up. For instance, the result could be "Jack", "Jackson", "Jason", "Dijafu". In Java, as I know, there are 3 build-in method to see search substring in a string. <ol> <li>string.contains()</li> <li>string.indexOf()</li> <li>regular expression. it is something like string.matches("ja")) </li> </ol> My question is: What are the runtimes of each method above? which one is the fastest or most efficient or most popular way to check if the list of string contains a given substring. I know there exists some algorithms that do the same thing, such as Boyer–Moore string search algorithm, Knuth–Morris–Pratt algorithm and so on. I do not want to use them because I just have a small list of strings, and I think using them is kind of overkill for me right now. Also I have to type a lot of extra coding for such a non-build-in algorithm. If you think my thoughts is not correct, please feel free to correct me.

The accepted answer is not correct and not complete. <ul> <li> <code>indexOf()</code> does a naive string search using backtracking on mismatches. This is quite fast on small patterns/texts but shows very poor performance on large texts</li> <li> <code>contains("ja")</code> should be comparable to indexOf (because it delegates to it)</li> <li> <code>matches("ja")</code> will not deliver the correct result, because it searches for an exact match (only the string <code>"ja"</code> will match exactly)</li> <li> <code>Pattern p = Pattern.compile("ja"); Matcher m = p.matcher("jack"); m.find();</code> would be the correct way to find texts with regular expressions. In practice (using large texts) it will be the most efficient way using only the java api. This is because a constant pattern (like <code>"ja"</code>) will not be processed by the regex engine (which is slow) but by an Boyer-Moore-Algorithm (which is fast)</li> </ul>

what is the fastest substring search method in Java

Tags:

java

substring

regex

search

search-engine

I need to implement a way to search substring (needles) in a list of string (haystack) using Java.

More specifically, my app has a list of user profiles. If I type some letters, for example, "Ja", and then search, then all the users whose name contains "ja" should show up. For instance, the result could be "Jack", "Jackson", "Jason", "Dijafu".

In Java, as I know, there are 3 build-in method to see search substring in a string.

string.contains()
string.indexOf()
regular expression. it is something like string.matches("ja"))

My question is: What are the runtimes of each method above? which one is the fastest or most efficient or most popular way to check if the list of string contains a given substring.

I know there exists some algorithms that do the same thing, such as Boyer–Moore string search algorithm, Knuth–Morris–Pratt algorithm and so on. I do not want to use them because I just have a small list of strings, and I think using them is kind of overkill for me right now. Also I have to type a lot of extra coding for such a non-build-in algorithm. If you think my thoughts is not correct, please feel free to correct me.

566

asked Aug 20 '13 16:08

Joey

1 Answers

The accepted answer is not correct and not complete.

indexOf() does a naive string search using backtracking on mismatches. This is quite fast on small patterns/texts but shows very poor performance on large texts
contains("ja") should be comparable to indexOf (because it delegates to it)
matches("ja") will not deliver the correct result, because it searches for an exact match (only the string "ja" will match exactly)
Pattern p = Pattern.compile("ja"); Matcher m = p.matcher("jack"); m.find(); would be the correct way to find texts with regular expressions. In practice (using large texts) it will be the most efficient way using only the java api. This is because a constant pattern (like "ja") will not be processed by the regex engine (which is slow) but by an Boyer-Moore-Algorithm (which is fast)

153

answered Sep 21 '22 12:09

CoronA

Related questions
                            
                                How to get java path in CentOS?
                            
                                A better way to convert Integer (may be null) to int in Java?
                            
                                How to set the output files when compiling with javac [duplicate]
                            
                                How to get the difference between two maps Java?
                            
                                WebDriver vs ChromeDriver [duplicate]
                            
                                Spring Validate List of Strings for non empty elements
                            
                                Is Java 8 stream laziness useless in practice?
                            
                                ClassNotFoundException re android.support.v4.view.ViewPager when inflating
                            
                                Can we deny a java object from serialization other than giving transient keyword
                            
                                Intersection of two strings in Java
                            
                                Sort ArrayList of Array in Java
                            
                                How to get an absolute URL of webapp from ExternalContext?
                            
                                Declare multiple String variables and initialize them to all to null at once
                            
                                Eclipse reports error on my JPA project
                            
                                Convert negative image to positive [closed]
                            
                                Difference between matches and equalsIgnoreCase or equals in string class
                            
                                Inner class and local variables
                            
                                public static void main () access non static variable
                            
                                Converting JsonNode to java array
                            
                                MultipleBagFetchException thrown by Hibernate

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With