Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex.Match whole words

Tags:

c#

.net

regex

In C#, I want to use a regular expression to match any of these words:

string keywords = "(shoes|shirt|pants)"; 

I want to find the whole words in the content string. I thought this regex would do that:

if (Regex.Match(content, keywords + "\\s+",    RegexOptions.Singleline | RegexOptions.IgnoreCase).Success) {     //matched } 

but it returns true for words like participants, even though I only want the whole word pants.

How do I match only those literal words?

like image 684
Kris B Avatar asked Jul 30 '09 20:07

Kris B


People also ask

Which regex matches the whole words dog or cat?

If we want to improve the first example to match whole words only, we would need to use \b(cat|dog)\b. This tells the regex engine to find a word boundary, then either cat or dog, and then another word boundary.

What does \b mean in regex?

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length. There are three different positions that qualify as word boundaries: Before the first character in the string, if the first character is a word character.

How do you match a whole word in Python?

To match whole exact words, use the word boundary metacharacter '\b' . This metacharacter matches at the beginning and end of each word—but it doesn't consume anything. In other words, it simply checks whether the word starts or ends at this position (by checking for whitespace or non-word characters).

What is a word boundary regex?

A word boundary, in most regex dialects, is a position between \w and \W (non-word char), or at the beginning or end of a string if it begins or ends (respectively) with a word character ( [0-9A-Za-z_] ). So, in the string "-12" , it would match before the 1 or after the 2.


2 Answers

You should add the word delimiter to your regex:

\b(shoes|shirt|pants)\b 

In code:

Regex.Match(content, @"\b(shoes|shirt|pants)\b"); 
like image 146
Philippe Leybaert Avatar answered Sep 22 '22 17:09

Philippe Leybaert


Try

Regex.Match(content, @"\b" + keywords + @"\b", RegexOptions.Singleline | RegexOptions.IgnoreCase) 

\b matches on word boundaries. See here for more details.

like image 39
Ben Lings Avatar answered Sep 18 '22 17:09

Ben Lings