Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

treating \r as \n in c# regex

Tags:

c#

regex

I have a c# function that finds patters of text in side an input and does some processing. (I am using 3.5 version of the .net framework)

public void func(string s)
{
    Regex r = new Regex("^\s*Pattern\s*$", RegexOptions.Multiline | RegexOptions.ExplicitCapture );
    Match m = r.Match(s);
    //Do something with m
}

A use of the function might look like this

string s = "Pattern \n Pattern \n non-Pattern";
func(s);

However, I am finding that sometimes my input is looking more like this

string s = "Pattern \r Pattern \r non-Pattern"
func(s);

And it is not being matched. Is there a way to have \r be treated like a \n within the regex? I figure I could always just replace all \rs with \ns, but I was hoping I could minimize operations if I could just get the regex do it all at once.

like image 389
MarkB42 Avatar asked May 14 '13 17:05

MarkB42


People also ask

Is \n and \r same?

historically a \n was used to move the carriage down, while the \r was used to move the carriage back to the left side of the page.

What does '\ r mean in C?

'\r' is the carriage return character.

What is use of \n in C?

\n (New line) – We use it to shift the cursor control to the new line. \t (Horizontal tab) – We use it to shift the cursor to a couple of spaces to the right in the same line.

What is the difference between \n and \0 in C?

\0 is the null byte, used to terminate strings. \n is the newline character, 10 in ASCII, used (on Unix) to separate lines.


2 Answers

Unfortunatly, when I have run in to similar situations the only situation I found that works is I just do two passes with the regex (like you where hoping to avoid), the first one normalizes the line endings then the 2nd one can do the search as normal, there is no way to get Multiline to trigger on just /r that I could find.

public void func(string s)
{
    s = Regex.Replace(s, @"(\r\n|\n\r|\n|\r)", "\r\n");
    Regex r = new Regex("^\s*Pattern\s*$", RegexOptions.Multiline | RegexOptions.ExplicitCapture );
    Match m = r.Match(s);
    //Do something with m
}
like image 113
Scott Chamberlain Avatar answered Oct 05 '22 15:10

Scott Chamberlain


According to the documentation Anchors in Regular Expression:

  • ^ in Multiline mode will match the beginning of input string, or the start of the line (as defined by \n).
  • $ in Multiline mode will match the end of input string, or just before \n.

If your purpose is to redefine the anchors to define a line with both \r and \n, then you have to simulate it with look-ahead and look-behind.

  • ^ should be simulated with (?<=\A|[\r\n])
  • $ should be simulated with (?=\Z|[\r\n])

Note that the simulation above will consider \r\n to have 3 starts of line and 3 ends of line. 1 start of line and 1 end of line are defined by start and end of the string. The other 2 starts of line and 2 ends of line are defined by \r and \n.

like image 31
nhahtdh Avatar answered Oct 05 '22 14:10

nhahtdh