Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why \b does not match word using .net regex

Tags:

c#

.net

regex

To review regular expresions I read this tutorial. Anyways that tutorial mentions that \b matches a word boundary (between \w and \W characters). That tutorial also gives a link where you can install expresso (program that helps when creating regular expressions).

So I have created my regular expressions in expresso and I do inded get a match. Now when I copy the same regex to visual studio I do not get a match. Take a look:


enter image description here


enter image description here

Why am I not getting a match? in the immediate window I am showing the content of variable output. In expresso I do get a match and in visual studio I don't. why?

like image 491
Tono Nam Avatar asked Jun 19 '12 15:06

Tono Nam


People also ask

How do you match everything after a word in regex?

If you want . to match really everything, including newlines, you need to enable "dot-matches-all" mode in your regex engine of choice (for example, add re. DOTALL flag in Python, or /s in PCRE.

What does \b do in regex?

The \b metacharacter matches at the beginning or end of a word.

Does \b work in C#?

Sure. Your \b is actually the backspace character, not the regex \b . You need to either use "\\b" to embed this in a C# string literal, or use verbatim string literals: @"\b" .

What regex does .NET use?

In . NET, regular expression patterns are defined by a special syntax or language, which is compatible with Perl 5 regular expressions and adds some additional features such as right-to-left matching. For more information, see Regular Expression Language - Quick Reference.


2 Answers

The C# language and .NET Regular Expressions both have their own distinct set of backslash-escape sequences, but the C# compiler is intercepting the "\b" in your string and converting it into an ASCII backspace character so the RegEx class never sees it. You need to make your string verbatim (prefix with an at-symbol) or double-escape the 'b' so the backslash is passed to RegEx like so:

@"\bCOMPILATION UNIT";

Or

"\\bCOMPILATION UNIT"

I'll say the .NET RegEx documentation does not make this clear. It took me a while to figure this out at first too.

Fun-fact: The \r and \n characters (carriage-return and line-break respectively) and some others are recognized by both RegEx and the C# language, so the end-result is the same, even if the compiled string is different.

like image 148
Dai Avatar answered Oct 08 '22 10:10

Dai


You should use @"\bCOMPILATION UNIT". This is a verbatim literal. When you do "\b" instead, it parses \b into a special character. You can also do "\\b", whose double backslash is parsed into a real backslash, but it's generally easier to just use verbatims when dealing with regex.

like image 31
Tim S. Avatar answered Oct 08 '22 10:10

Tim S.