Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.NET's Regex class and newline

Tags:

c#

.net

regex

Why doesn't .NET regex treat \n as end of line character?

Sample code:

string[] words = new string[] { "ab1", "ab2\n", "ab3\n\n", "ab4\r", "ab5\r\n", "ab6\n\r" };
Regex regex = new Regex("^[a-z0-9]+$");
foreach (var word in words)
{
    Console.WriteLine("{0} - {1}", word, regex.IsMatch(word));
}

And this is the response I get:

ab1 - True
ab2
 - True
ab3

 - False
 - False
ab5
 - False
ab6
 - False

Why does the regex match ab2\n?

Update: I don't think Multiline is a good solution, that is, I want to validate login to match only specified characters, and it must be single line. If I change the constructor for MultiLine option ab1, ab2, ab3 and ab6 match the expression, ab4 and ab5 don't match it.

like image 360
empi Avatar asked Jun 12 '09 20:06

empi


2 Answers

If the string ends with a line break the RegexOptions.Multiline will not work. The $ will just ignore the last line break since there is nothing after that.

If you want to match till the very end of the string and ignore any line breaks use \z

Regex regex = new Regex(@"^[a-z0-9]+\z", RegexOptions.Multiline);

This is for both MutliLine and SingleLine, that doesn't matter.

like image 94
Remco Eissing Avatar answered Sep 19 '22 14:09

Remco Eissing


The .NET regex engine does treat \n as end-of-line. And that's a problem if your string has Windows-style \r\n line breaks. With RegexOptions.Multiline turned on $ matches between \r and \n rather than before \r.

$ also matches at the very end of the string just like \z. The difference is that \z can match only at the very end of the string, while $ also matches before a trailing \n. When using RegexOptions.Multiline, $ also matches before any \n.

If you're having trouble with line breaks, a trick is to first to a search-and-replace to replace all \r with nothing to make sure all your lines end with \n only.

like image 25
Jan Goyvaerts Avatar answered Sep 18 '22 14:09

Jan Goyvaerts