How do you match more than one space character in Java regex?
I have a regex I am trying to match. The regex fails when I have two or more space characters.
public static void main(String[] args) {
String pattern = "\\b(fruit)\\s+([^a]+\\w+)\\b"; //Match 'fruit' not followed by a word that begins with 'a'
String str = "fruit apple"; //One space character will not be matched
String str_fail = "fruit apple"; //Two space characters will be matched
System.out.println(preg_match(pattern,str)); //False (Thats what I want)
System.out.println(preg_match(pattern,str_fail)); //True (Regex fail)
}
public static boolean preg_match(String pattern,String subject) {
Pattern regex = Pattern.compile(pattern);
Matcher regexMatcher = regex.matcher(subject);
return regexMatcher.find();
}
The plus sign + is a greedy quantifier, which means one or more times. For example, expression X+ matches one or more X characters. Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.
If you're looking for one or more, it's " *" (that's two spaces and an asterisk) or " +" (one space and a plus). If you're looking for common spacing, use "[ X]" or "[ X][ X]*" or "[ X]+" where X is the physical tab character (and each is preceded by a single space in all those examples).
+: one or more ( 1+ ), e.g., [0-9]+ matches one or more digits such as '123' , '000' . *: zero or more ( 0+ ), e.g., [0-9]* matches zero or more digits. It accepts all those in [0-9]+ plus the empty string.
Yes, also your regex will match if there are just spaces.
The problem is actually because of backtracking. Your regex:
"\\b(fruit)\\s+([^a]+\\w+)\\b"
Says "fruit, followed by one or more spaces, followed by one or more non 'a' characters, followed by one or more 'word' characters". The reason this fails with two spaces is because \s+
matches the first space, but then gives back the second, which then satisfies the [^a]+
(with the second space) and the \s+
portion (with the first).
I think you can fix it by simply using the posessive quantifier instead, which would be \s++
. This tells the \s
not to give back the second space character. You can find the documentation on Java's quantifiers here.
As an illustration, here are two examples at Rubular:
\s
(gives expected results, from what you describe)[^a\]+
and \w+
. Notice that the second match group (representing the [^a]+
) is capturing a the second space character.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With