I want to set a pattern which will find a capture group limited by the first occurrence of the “boundary”. But now the last boundary is used.
E.g.:
String text = "this should match from A to the first B and not 2nd B, got that?";
Pattern ptrn = Pattern.compile("\\b(A.*B)\\b");
Matcher mtchr = ptrn.matcher(text);
while(mtchr.find()) {
String match = mtchr.group();
System.out.println("Match = <" + match + ">");
}
prints:
"Match = <A to the first B and not 2nd B>"
and I want it to print:
"Match = <A to the first B>"
What do I need to change within the pattern?
In Java, "\b" is a back-space character (char 0x08 ), which when used in a regex will match a back-space literal.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
The expression \w will match any word character. Word characters include alphanumeric characters ( - , - and - ) and underscores (_). \W matches any non-word character.
Characters can be escaped in Java Regex in two ways which are listed as follows which we will be discussing upto depth: Using \Q and \E for escaping. Using backslash(\\) for escaping.
Make your *
non-greedy / reluctant using *?
:
Pattern ptrn = Pattern.compile("\\b(A.*?B)\\b");
By default, the pattern will behave greedily, and match as many characters as possible to satisfy the pattern, that is, up until the last B.
See Reluctant Quantifiers from the docs, and this tutorial.
Don't use a greedy expression for matching, i.e.:
Pattern ptrn = Pattern.compile("\\b(A.*?B)\\b");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With