Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Regex : match whole word with word boundary

Tags:

java

string

regex

I am trying to check whether a string contains a word as a whole, using Java. Below are some examples:

Text : "A quick brown fox"
Words:
"qui" - false
"quick" - true
"quick brown" - true
"ox" - false
"A" - true

Below is my code:

String pattern = "\\b(<word>)\\b";
String s = "ox";
String text = "A quick brown fox".toLowerCase();
System.out.println(Pattern.compile(pattern.replaceAll("<word>", s.toLowerCase())).matcher(text).find());

It works fine with strings like the one I mentioned in the above example. However, I get incorrect results if the input string has characters like %, ( etc, e.g.:

Text : "c14, 50%; something (in) bracket"
Words:
"c14, 50%;" : false
"(in) bracket" : false

It has something to do with my regex pattern (or maybe I am doing the entire pattern matching wrongly). Could anyone suggest me a better approach.

like image 202
Darshan Mehta Avatar asked Mar 20 '17 13:03

Darshan Mehta


Video Answer


1 Answers

It appears you only want to match "words" enclosed with whitespace (or at the start/end of strings).

Use

String pattern = "(?<!\\S)" + Pattern.quote(word) + "(?!\\S)";

The (?<!\S) negative lookbehind will fail all matches that are immediately preceded with a char other than a whitespace and (?!\s) is a negative lookahead that will fail all matches that are immediately followed with a char other than whitespace. Pattern.quote() is necessary to escape special chars that need to be treated as literal chars in the regex pattern.

like image 60
Wiktor Stribiżew Avatar answered Sep 20 '22 07:09

Wiktor Stribiżew