Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Regex : How to match one or more space characters

Tags:

java

regex

How do you match more than one space character in Java regex?

I have a regex I am trying to match. The regex fails when I have two or more space characters.

public static void main(String[] args) { 
    String pattern = "\\b(fruit)\\s+([^a]+\\w+)\\b"; //Match 'fruit' not followed by a word that begins with 'a'
    String str = "fruit apple"; //One space character will not be matched
    String str_fail = "fruit  apple"; //Two space characters will be matched
    System.out.println(preg_match(pattern,str)); //False (Thats what I want)
    System.out.println(preg_match(pattern,str_fail)); //True (Regex fail)
}

public static boolean preg_match(String pattern,String subject) {
    Pattern regex = Pattern.compile(pattern);
    Matcher regexMatcher = regex.matcher(subject);
    return regexMatcher.find();
}
like image 885
MontrealDevOne Avatar asked Jun 07 '12 14:06

MontrealDevOne


People also ask

What does \\ s+ mean in regex?

The plus sign + is a greedy quantifier, which means one or more times. For example, expression X+ matches one or more X characters. Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.

How do you match a space in regex?

If you're looking for one or more, it's " *" (that's two spaces and an asterisk) or " +" (one space and a plus). If you're looking for common spacing, use "[ X]" or "[ X][ X]*" or "[ X]+" where X is the physical tab character (and each is preceded by a single space in all those examples).

Which regex matches one or more digits?

+: one or more ( 1+ ), e.g., [0-9]+ matches one or more digits such as '123' , '000' . *: zero or more ( 0+ ), e.g., [0-9]* matches zero or more digits. It accepts all those in [0-9]+ plus the empty string.

Can regex include spaces?

Yes, also your regex will match if there are just spaces.


1 Answers

The problem is actually because of backtracking. Your regex:

 "\\b(fruit)\\s+([^a]+\\w+)\\b"

Says "fruit, followed by one or more spaces, followed by one or more non 'a' characters, followed by one or more 'word' characters". The reason this fails with two spaces is because \s+ matches the first space, but then gives back the second, which then satisfies the [^a]+ (with the second space) and the \s+ portion (with the first).

I think you can fix it by simply using the posessive quantifier instead, which would be \s++. This tells the \s not to give back the second space character. You can find the documentation on Java's quantifiers here.


As an illustration, here are two examples at Rubular:

  1. Using the possessive quantifier on \s (gives expected results, from what you describe)
  2. Your current regex with separate groupings around [^a\]+ and \w+. Notice that the second match group (representing the [^a]+) is capturing a the second space character.
like image 189
eldarerathis Avatar answered Oct 28 '22 13:10

eldarerathis