Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalize newlines in C#

Tags:

c#

.net

I have a data stream that may contain \r, \n, \r\n, \n\r or any combination of them. Is there a simple way to normalize the data to make all of them simply become \r\n pairs to make display more consistent?

So something that would yield this kind of translation table:

\r     --> \r\n
\n     --> \r\n
\n\n   --> \r\n\r\n
\n\r   --> \r\n
\r\n   --> \r\n
\r\n\n --> \r\n\r\n
like image 643
ctacke Avatar asked Sep 26 '08 17:09

ctacke


5 Answers

I believe this will do what you need:

using System.Text.RegularExpressions;
// ...
string normalized = Regex.Replace(originalString, @"\r\n|\n\r|\n|\r", "\r\n");

I'm not 100% sure on the exact syntax, and I don't have a .Net compiler handy to check. I wrote it in perl, and converted it into (hopefully correct) C#. The only real trick is to match "\r\n" and "\n\r" first.

To apply it to an entire stream, just run in on chunks of input. (You could do this with a stream wrapper if you want.)


The original perl:

$str =~ s/\r\n|\n\r|\n|\r/\r\n/g;

The test results:

[bash$] ./test.pl
\r -> \r\n
\n -> \r\n
\n\n -> \r\n\r\n
\n\r -> \r\n
\r\n -> \r\n
\r\n\n -> \r\n\r\n

Update: Now converts \n\r to \r\n, though I wouldn't call that normalization.

like image 56
Derek Park Avatar answered Nov 20 '22 11:11

Derek Park


I'm with Jamie Zawinski on RegEx:

"Some people, when confronted with a problem, think "I know, I’ll use regular expressions." Now they have two problems"

For those of us who prefer readability:

  • Step 1

    Replace \r\n by \n

    Replace \n\r by \n (if you really want this, some posters seem to think not)

    Replace \r by \n

  • Step 2 Replace \n by Environment.NewLine or \r\n or whatever.

like image 40
Joe Avatar answered Nov 20 '22 11:11

Joe


Normalise breaks, so that they are all \r\n

var normalisedString =
            sourceString
            .Replace("\r\n", "\n")
            .Replace("\n\r", "\n")
            .Replace("\r", "\n")
            .Replace("\n", "\r\n");
like image 3
Phil Avatar answered Nov 20 '22 12:11

Phil


It's a two step process.
First you convert all the combinations of \r and \n into a single one, say \r
Then you convert all the \r into your target \r\n

normalized = 
    original.Replace("\r\n", "\r").
             Replace("\n\r", "\r").
             Replace("\n", "\r").
             Replace("\r", "\r\n"); // last step
like image 3
GDavoli Avatar answered Nov 20 '22 12:11

GDavoli


A Regex would help.. could do something roughly like this..

(\r\n|\n\n|\n\r|\r|\n) replace with \r\n

This regex produced these results from the table posted (just testing left side) so a replace should normalize.

\r   => \r 
\n   => \n 
\n\n => \n\n 
\n\r => \n\r 
\r\n => \r\n 
\r\n => \r\n 
\n   => \n 
like image 2
Quintin Robinson Avatar answered Nov 20 '22 11:11

Quintin Robinson