I have next code:
public static void createTokens(){
String test = "test is a word word word word big small";
Matcher mtch = Pattern.compile("test is a (\\s*.+?\\s*) word (\\s*.+?\\s*)").matcher(test);
while (mtch.find()){
for (int i = 1; i <= mtch.groupCount(); i++){
System.out.println(mtch.group(i));
}
}
}
And have next output:
word
w
But in my opinion it must be:
word
word
Somebody please explain me why so?
A non-greedy match means that the regex engine matches as few characters as possible—so that it still can match the pattern in the given string.
Greedy matching means that the expression will match as large a group as possible, while non-greedy means it will match the smallest group possible.
You make it non-greedy by using ". *?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ". *?" . This means that if for instance nothing comes after the ".
Regular expressions are generally considered greedy because an expression with repetitions will attempt to match as many characters as possible. The asterisk ( * ), plus ( + ), question mark ( ? ), and curly braces ( {} ) metacharacters exhibit 'repetitious' behavior, and attempt to match as many instances as possible.
Because your patterns are non-greedy, so they matched as little text as possible while still consisting of a match.
Remove the ? in the second group, and you'll get
word
word word big small
Matcher mtch = Pattern.compile("test is a (\\s*.+?\\s*) word (\\s*.+\\s*)").matcher(test);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With