In C#
, I want to use a regular expression to match any of these words:
string keywords = "(shoes|shirt|pants)";
I want to find the whole words in the content string. I thought this regex
would do that:
if (Regex.Match(content, keywords + "\\s+", RegexOptions.Singleline | RegexOptions.IgnoreCase).Success) { //matched }
but it returns true for words like participants
, even though I only want the whole word pants
.
How do I match only those literal words?
If we want to improve the first example to match whole words only, we would need to use \b(cat|dog)\b. This tells the regex engine to find a word boundary, then either cat or dog, and then another word boundary.
The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length. There are three different positions that qualify as word boundaries: Before the first character in the string, if the first character is a word character.
To match whole exact words, use the word boundary metacharacter '\b' . This metacharacter matches at the beginning and end of each word—but it doesn't consume anything. In other words, it simply checks whether the word starts or ends at this position (by checking for whitespace or non-word characters).
A word boundary, in most regex dialects, is a position between \w and \W (non-word char), or at the beginning or end of a string if it begins or ends (respectively) with a word character ( [0-9A-Za-z_] ). So, in the string "-12" , it would match before the 1 or after the 2.
You should add the word delimiter to your regex:
\b(shoes|shirt|pants)\b
In code:
Regex.Match(content, @"\b(shoes|shirt|pants)\b");
Try
Regex.Match(content, @"\b" + keywords + @"\b", RegexOptions.Singleline | RegexOptions.IgnoreCase)
\b
matches on word boundaries. See here for more details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With