Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java match whole word in String

I have an ArrayList<String> which I iterate through to find the correct index given a String. Basically, given a String, the program should search through the list and find the index where the whole word matches. For example:

ArrayList<String> foo = new ArrayList<String>();
foo.add("AAAB_11232016.txt");
foo.add("BBB_12252016.txt");
foo.add("AAA_09212017.txt");

So if I give the String AAA, I should get back index 2 (the last one). So I can't use the contains() method as that would give me back index 0.

I tried with this code:

String str = "AAA";
String pattern = "\\b" + str + "\\b";
Pattern p = Pattern.compile(pattern);

for(int i = 0; i < foo.size(); i++) {
    // Check each entry of list to find the correct value
    Matcher match = p.matcher(foo.get(i));

    if(match.find() == true) {
        return i;
    }
}

Unfortunately, this code never reaches the if statement inside the loop. I'm not sure what I'm doing wrong.

Note: This should also work if I searched for AAA_0921, the full name AAA_09212017.txt, or any part of the String that is unique to it.

like image 840
syy Avatar asked Jul 06 '16 16:07

syy


People also ask

How do you match a word in Java?

The meta character "\b" matches word boundaries. i.e. it matches before the first and after the last word characters and between word and non-word characters.

How do I find a specific word in a string in Java?

The String is a sequence of characters and a class in Java. To find a word in the string, we are using indexOf() and contains() methods of String class. The indexOf() method is used to find an index of the specified substring in the present string.

What is word boundary in regex Java?

A word boundary, in most regex dialects, is a position between \w and \W (non-word char), or at the beginning or end of a string if it begins or ends (respectively) with a word character ( [0-9A-Za-z_] ). So, in the string "-12" , it would match before the 1 or after the 2. The dash is not a word character.

What is word boundary in regex?

A word boundary is a zero-width test between two characters. To pass the test, there must be a word character on one side, and a non-word character on the other side. It does not matter which side each character appears on, but there must be one of each.


1 Answers

Since word boundary does not match between a word char and underscore you need

String pattern = "(?<=_|\\b)" + str + "(?=_|\\b)";

Here, (?<=_|\b) positive lookbehind requires a word boundary or an underscore to appear before the str, and the (?=_|\b) positive lookahead requires an underscore or a word boundary to appear right after the str.

See this regex demo.

If your word may have special chars inside, you might want to use a more straight-forward word boundary:

"(?<![^\\W_])" + Pattern.quote(str) + "(?![^\\W_])"

Here, the negative lookbehind (?<![^\\W_]) fails the match if there is a word character except an underscore ([^...] is a negated character class that matches any character other than the characters, ranges, etc. defined inside this class, thus, it matches all characters other than a non-word char \W and a _), and the (?![^\W_]) negative lookahead fails the match if there is a word char except the underscore after the str.

Note that the second example has a quoted search string, so that even AA.A_str.txt could be matched well with AA.A.

See another regex demo

like image 167
Wiktor Stribiżew Avatar answered Oct 07 '22 19:10

Wiktor Stribiżew