Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.NET Regex dot character matches carriage return?

Tags:

.net

regex

Every single flavor of regex I have ever used has always had the "." character match everything but a new line (\r or \n)... unless, of course, you enable the single-line flag.

So when I tried the following C# code I was shocked:

Regex rgx = new Regex(".");
if (rgx.Match("\r\n").Success)
  MessageBox.Show("There is something rotten in the state of Redmond!");

It showed the message. Just to make sure I wasn't going insane, I tried the following JavaScript code:

if (/./.test("\r\n"))
  alert("Something's wrong with JavaScript too.");

The JavaScript didn't show the message, meaning it's working exactly as it should.

Apparently, the "." character in .NET is matching the "\r" character. I checked the documentation to see if the mention anything about it:

Wildcard: Matches any single character except \n.

Wow... since when does a Regex flavor ever have the dot match a carriage return? You would think .NET would behave like all the rest of the Regex flavors... especially because it's in a Windows environment which uses "\r\n" as line delimiters.

Is there any flag/setting I can enable to make it work as it does in other Regex flavors? Are there any alternative solutions which don't involve replacing all . characters with [^\r\n]?

like image 463
Senseful Avatar asked Feb 17 '10 16:02

Senseful


People also ask

What is \r and \n in regex?

\n. Matches a newline character. \r. Matches a carriage return character.

Does dot match newline regex?

By default in most regex engines, . doesn't match newline characters, so the matching stops at the end of each logical line. If you want . to match really everything, including newlines, you need to enable "dot-matches-all" mode in your regex engine of choice (for example, add re. DOTALL flag in Python, or /s in PCRE.

How do I match a dot character in regex?

A . in regex is a metacharacter, it is used to match any character. To match a literal dot in a raw Python string ( r"" or r'' ), you need to escape it, so r"\." Unless the regular expression is stored inside a regular python string, in which case you need to use a double \ ( \\ ) instead.


1 Answers

I ran into this same issue when writing Regex Hero. It is a little bizarre. I blogged about the issue here. And that led to me adding a feature to the tester to enable/disable CRLFs. Anyway, for some reason Microsoft chose to use \n (line feeds) to mark line endings.

(UPDATE) The reason must be related to this:

Microsoft .NET Framework regular expressions incorporate the most popular features of other regular expression implementations such as those in Perl and awk. Designed to be compatible with Perl 5 regular expressions, .NET Framework regular expressions include features not yet seen in other implementations, such as right-to-left matching and on-the-fly compilation. http://msdn.microsoft.com/en-us/library/hs600312.aspx

And as Igor noted, Perl has the same behavior.

Now, the Singleline and Multiline RegexOptions change behavior based around dots and line feeds. You can enable the Singleline RegexOption so that the dot matches line feeds. And you can enable the Multiline RegexOption so that ^ and $ mark the beginning and end of every line (denoted by line feeds). But you can't change the inherent behavior of the dot (.) operator to match everything except for \r\n.

like image 65
Steve Wortham Avatar answered Sep 24 '22 08:09

Steve Wortham