I have an ArrayList<String>
which I iterate through to find the correct index given a String. Basically, given a String, the program should search through the list and find the index where the whole word matches. For example:
ArrayList<String> foo = new ArrayList<String>();
foo.add("AAAB_11232016.txt");
foo.add("BBB_12252016.txt");
foo.add("AAA_09212017.txt");
So if I give the String AAA
, I should get back index 2
(the last one). So I can't use the contains()
method as that would give me back index 0
.
I tried with this code:
String str = "AAA";
String pattern = "\\b" + str + "\\b";
Pattern p = Pattern.compile(pattern);
for(int i = 0; i < foo.size(); i++) {
// Check each entry of list to find the correct value
Matcher match = p.matcher(foo.get(i));
if(match.find() == true) {
return i;
}
}
Unfortunately, this code never reaches the if
statement inside the loop. I'm not sure what I'm doing wrong.
Note: This should also work if I searched for AAA_0921
, the full name AAA_09212017.txt
, or any part of the String that is unique to it.
The meta character "\b" matches word boundaries. i.e. it matches before the first and after the last word characters and between word and non-word characters.
The String is a sequence of characters and a class in Java. To find a word in the string, we are using indexOf() and contains() methods of String class. The indexOf() method is used to find an index of the specified substring in the present string.
A word boundary, in most regex dialects, is a position between \w and \W (non-word char), or at the beginning or end of a string if it begins or ends (respectively) with a word character ( [0-9A-Za-z_] ). So, in the string "-12" , it would match before the 1 or after the 2. The dash is not a word character.
A word boundary is a zero-width test between two characters. To pass the test, there must be a word character on one side, and a non-word character on the other side. It does not matter which side each character appears on, but there must be one of each.
Since word boundary does not match between a word char and underscore you need
String pattern = "(?<=_|\\b)" + str + "(?=_|\\b)";
Here, (?<=_|\b)
positive lookbehind requires a word boundary or an underscore to appear before the str
, and the (?=_|\b)
positive lookahead requires an underscore or a word boundary to appear right after the str
.
See this regex demo.
If your word may have special chars inside, you might want to use a more straight-forward word boundary:
"(?<![^\\W_])" + Pattern.quote(str) + "(?![^\\W_])"
Here, the negative lookbehind (?<![^\\W_])
fails the match if there is a word character except an underscore ([^...]
is a negated character class that matches any character other than the characters, ranges, etc. defined inside this class, thus, it matches all characters other than a non-word char \W
and a _
), and the (?![^\W_])
negative lookahead fails the match if there is a word char except the underscore after the str
.
Note that the second example has a quoted search string, so that even AA.A_str.txt
could be matched well with AA.A
.
See another regex demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With