Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

\b doesn't match when the preceding character is a word boundary

Tags:

java

regex

I have a rather peculiar problem. I'm trying to find a pattern like [some string][word boundary]. Simplified, my code is:

final Pattern pattern = Pattern.compile(Pattern.quote(someString) + "\\b");
final String value = someString + " ";
System.out.println(pattern.matcher(value).find());

My logic tells me this should always output true, regardless of what someString is. However:

  • if someString ends with a word character (e.g. "abc"), true is outputted;
  • if someString ends with a word boundary (e.g. "abc."), false is outputted.

Any ideas what is happening? My current workaround is to use \W instead of \b, but I'm not sure of the implications.

like image 812
Felix Avatar asked Jul 04 '12 13:07

Felix


2 Answers

A dot then a space is not a word boundary.

A word boundary is between a word character, then a non-word character, or visa versa.
ie between [a-zA-Z0-9_][^a-zA-Z0-9_] or [^a-zA-Z0-9_][a-zA-Z0-9_]

like image 100
Bohemian Avatar answered Oct 23 '22 21:10

Bohemian


A word boundary is a non-word character that is preceded by a word character or vice versa. The space preceded by a period (2 non-word characters) does not meet this requirement.

The effect of using \W is that any non-word characters will be matched (the same as \b, but without the condition that the character is preceded by a word character), which seems correct for your example.

like image 4
Rich O'Kelly Avatar answered Oct 23 '22 22:10

Rich O'Kelly