Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex expression to match whole word with special characters not working ? [duplicate]

I was going through this question C#, Regex.Match whole words

It says for match whole word use "\bpattern\b" This works fine for match whole word without any special characters since it is meant for word characters only!

I need an expression to match words with special characters also. My code is as follows

class Program
{
    static void Main(string[] args)
    {
        string str = Regex.Escape("Hi temp% dkfsfdf hi");
        string pattern = Regex.Escape("temp%");
        var matches = Regex.Matches(str, "\\b" + pattern + "\\b" , RegexOptions.IgnoreCase);
        int count = matches.Count;
    }
}

But it fails because of %. Do we have any workaround for this? There can be other special characters like 'space','(',')', etc

like image 909
Gurucharan Balakuntla Maheshku Avatar asked Nov 24 '11 12:11

Gurucharan Balakuntla Maheshku


People also ask

How does regex deal with special characters?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" .

What does \b mean in regular expressions?

The \b metacharacter matches at the beginning or end of a word.

How do you match a full expression in regex?

To run a “whole words only” search using a regular expression, simply place the word between two word boundaries, as we did with ‹ \bcat\b ›. The first ‹ \b › requires the ‹ c › to occur at the very start of the string, or after a nonword character.


2 Answers

If you have non-word characters then you cannot use \b. You can use the following

@"(?<=^|\s)" + pattern + @"(?=\s|$)"

Edit: As Tim mentioned in comments, your regex is failing precisely because \b fails to match the boundary between % and the white-space next to it because both of them are non-word characters. \b matches only the boundary between word character and a non-word character.

See more on word boundaries here.

Explanation

@"
(?<=        # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
               # Match either the regular expression below (attempting the next alternative only if this one fails)
      ^           # Assert position at the beginning of the string
   |           # Or match regular expression number 2 below (the entire group fails if this one fails to match)
      \s          # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
)
temp%       # Match the characters “temp%” literally
(?=         # Assert that the regex below can be matched, starting at this position (positive lookahead)
               # Match either the regular expression below (attempting the next alternative only if this one fails)
      \s          # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   |           # Or match regular expression number 2 below (the entire group fails if this one fails to match)
      $           # Assert position at the end of the string (or before the line break at the end of the string, if any)
)
"
like image 60
Narendra Yadala Avatar answered Nov 15 '22 00:11

Narendra Yadala


If the pattern can contain characters that are special to Regex, run it through Regex.Escape first.

This you did, but do not escape the string that you search through - you don't need that.

like image 25
Hans Kesting Avatar answered Nov 15 '22 00:11

Hans Kesting