If I am looking for a particular word inside a string, for example, in the string "how are you" I am looking for "are". Would a regular indexOf() work faster and better or a Regex match()
String testStr = "how are you";
String lookUp = "are";
//METHOD1
if (testStr.indexOf(lookUp) != -1)
{
System.out.println("Found!");
}
//OR
//METHOD 2
if (testStr.match(".*"+lookUp+".*"))
{
System.out.println("Found!");
}
Which of the two methods above is a better way of looking for a string inside another string? Or is there a much better alternative?
You can search for a particular letter in a string using the indexOf() method of the String class. This method which returns a position index of a word within the string if found. Otherwise it returns -1.
Python String find() method returns the lowest index or first occurrence of the substring if it is found in a given string. If it is not found, then it returns -1.
Search for a character in a string - strchr & strrchr The strchr function returns the first occurrence of a character within a string. The strrchr returns the last occurrence of a character within a string. They return a character pointer to the character found, or NULL pointer if the character is not found.
You can extract a substring from a String using the substring() method of the String class to this method you need to pass the start and end indexes of the required substring.
If you don't care whether it's actually the entire word you're matching, then indexOf()
will be a lot faster.
If, on the other hand, you need to be able to differentiate between are
, harebrained
, aren't
etc., then you need a regex: \bare\b
will only match are
as an entire word (\\bare\\b
in Java).
\b
is a word boundary anchor, and it matches the empty space between an alphanumeric character (letter, digit, or underscore) and a non-alphanumeric character.
Caveat: This also means that if your search term isn't actually a word (let's say you're looking for ###
), then these word boundary anchors will only match in a string like aaa###zzz
, but not in +++###+++
.
Further caveat: Java has by default a limited worldview on what constitutes an alphanumeric character. Only ASCII letters/digits (plus the underscore) count here, so word boundary anchors will fail on words like élève
, relevé
or ärgern
. Read more about this (and how to solve this problem) here.
Method one should be faster because it has lesser overhead. if it is about performance in searching in huge files a specialized method like boyer moore pattern matching could lead to further improvements.
If you are looking for a fixed string, not a pattern, as in the example in your question, indexOf
will be better (simpler) and faster, since it does not need to use regular expressions.
Also, if the string you are searching for does contain characters that have a special meaning in regular expressions, with indexOf
you don't need to worry about escaping these characters.
In general, use indexOf
where possible, and match
for pattern matching, where indexOf
cannot do what you need.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With