Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Non-greedy Regular Expression in Java

I have next code:

public static void createTokens(){
    String test = "test is a word word word word big small";
    Matcher mtch = Pattern.compile("test is a (\\s*.+?\\s*) word (\\s*.+?\\s*)").matcher(test);
    while (mtch.find()){
        for (int i = 1; i <= mtch.groupCount(); i++){
            System.out.println(mtch.group(i));
        }
    }
}

And have next output:

word
w

But in my opinion it must be:

word
word

Somebody please explain me why so?

like image 791
Divers Avatar asked Jan 19 '12 18:01

Divers


People also ask

What is non-greedy regex?

A non-greedy match means that the regex engine matches as few characters as possible—so that it still can match the pattern in the given string.

What is greedy match and non-greedy match in Java?

Greedy matching means that the expression will match as large a group as possible, while non-greedy means it will match the smallest group possible.

How do I stop regex from being greedy?

You make it non-greedy by using ". *?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ". *?" . This means that if for instance nothing comes after the ".

Are regular expressions greedy?

Regular expressions are generally considered greedy because an expression with repetitions will attempt to match as many characters as possible. The asterisk ( * ), plus ( + ), question mark ( ? ), and curly braces ( {} ) metacharacters exhibit 'repetitious' behavior, and attempt to match as many instances as possible.


1 Answers

Because your patterns are non-greedy, so they matched as little text as possible while still consisting of a match.

Remove the ? in the second group, and you'll get
word
word word big small

Matcher mtch = Pattern.compile("test is a (\\s*.+?\\s*) word (\\s*.+\\s*)").matcher(test);
like image 72
theglauber Avatar answered Sep 27 '22 22:09

theglauber