Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search for a word in a String

Tags:

java

string

regex

If I am looking for a particular word inside a string, for example, in the string "how are you" I am looking for "are". Would a regular indexOf() work faster and better or a Regex match()

String testStr = "how are you";
String lookUp = "are";

//METHOD1
if (testStr.indexOf(lookUp) != -1)
{
 System.out.println("Found!");
}

//OR
//METHOD 2
if (testStr.match(".*"+lookUp+".*"))
{
 System.out.println("Found!");
}

Which of the two methods above is a better way of looking for a string inside another string? Or is there a much better alternative?

  • Ivard
like image 473
topgun_ivard Avatar asked Oct 07 '10 06:10

topgun_ivard


People also ask

How do you search for an element in a string?

You can search for a particular letter in a string using the indexOf() method of the String class. This method which returns a position index of a word within the string if found. Otherwise it returns -1.

How do I find a word in a string Python?

Python String find() method returns the lowest index or first occurrence of the substring if it is found in a given string. If it is not found, then it returns -1.

How do I find a word in a string in C?

Search for a character in a string - strchr & strrchr The strchr function returns the first occurrence of a character within a string. The strrchr returns the last occurrence of a character within a string. They return a character pointer to the character found, or NULL pointer if the character is not found.

How do I extract a specific word from a string in Java?

You can extract a substring from a String using the substring() method of the String class to this method you need to pass the start and end indexes of the required substring.


3 Answers

If you don't care whether it's actually the entire word you're matching, then indexOf() will be a lot faster.

If, on the other hand, you need to be able to differentiate between are, harebrained, aren't etc., then you need a regex: \bare\b will only match are as an entire word (\\bare\\b in Java).

\b is a word boundary anchor, and it matches the empty space between an alphanumeric character (letter, digit, or underscore) and a non-alphanumeric character.

Caveat: This also means that if your search term isn't actually a word (let's say you're looking for ###), then these word boundary anchors will only match in a string like aaa###zzz, but not in +++###+++.

Further caveat: Java has by default a limited worldview on what constitutes an alphanumeric character. Only ASCII letters/digits (plus the underscore) count here, so word boundary anchors will fail on words like élève, relevé or ärgern. Read more about this (and how to solve this problem) here.

like image 71
Tim Pietzcker Avatar answered Sep 18 '22 11:09

Tim Pietzcker


Method one should be faster because it has lesser overhead. if it is about performance in searching in huge files a specialized method like boyer moore pattern matching could lead to further improvements.

like image 23
stacker Avatar answered Sep 21 '22 11:09

stacker


If you are looking for a fixed string, not a pattern, as in the example in your question, indexOf will be better (simpler) and faster, since it does not need to use regular expressions.

Also, if the string you are searching for does contain characters that have a special meaning in regular expressions, with indexOf you don't need to worry about escaping these characters.

In general, use indexOf where possible, and match for pattern matching, where indexOf cannot do what you need.

like image 40
Grodriguez Avatar answered Sep 19 '22 11:09

Grodriguez