Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a file into a string with CR/LF preserved?

If I asked the question "how to read a file into a string" the answer would be obvious. However -- here is the catch with CR/LF preserved.

The problem is, File.ReadAllText strips those characters. StreamReader.ReadToEnd just converted LF into CR for me which led to long investigation where I have bug in pretty obvious code ;-)

So, in short, if I have file containing foo\n\r\nbar I would like to get foo\n\r\nbar (i.e. exactly the same content), not foo bar, foobar, or foo\n\n\nbar. Is there some ready to use way in .Net space?

The outcome should be always single string, containing entire file.

like image 311
greenoldman Avatar asked Nov 22 '12 21:11

greenoldman


People also ask

How do I view a Crlf in a text file?

In Notepad++ go to the View > Show Symbol menu and select Show End of Line. Once you select View > Show Symbol > Show End of Line you can see the CR LF characters visually.

What is LF line endings?

LF : Line Feed LF stands for “line feed,” but you're probably more familiar with the term newline (the escape sequence \n ). Simply put, this character represents the end of a line of text. On Linux and Mac, this is equivalent to the start of a new line of text.


3 Answers

Are you sure that those methods are the culprits that are stripping out your characters?

I tried to write up a quick test; StreamReader.ReadToEnd preserves all newline characters.

string str = "foo\n\r\nbar";
using (Stream ms = new MemoryStream(Encoding.ASCII.GetBytes(str)))
using (StreamReader sr = new StreamReader(ms, Encoding.UTF8))
{
    string str2 = sr.ReadToEnd();
    Console.WriteLine(string.Join(",", str2.Select(c => ((int)c))));
}

// Output: 102,111,111,10,13,10,98,97,114
//           f   o   o \n \r \n  b  a   r

An identical result is achieved when writing to and reading from a temporary file:

string str = "foo\n\r\nbar";
string temp = Path.GetTempFileName();
File.WriteAllText(temp, str);
string str2 = File.ReadAllText(temp);
Console.WriteLine(string.Join(",", str2.Select(c => ((int)c))));

It appears that your newlines are getting lost elsewhere.

like image 139
Douglas Avatar answered Oct 21 '22 10:10

Douglas


This piece of code will preserve LR and CR

string r = File.ReadAllText(@".\TestData\TR120119.TRX", Encoding.ASCII);
like image 37
Jesper Avatar answered Oct 21 '22 09:10

Jesper


The outcome should be always single string, containing entire file.

It takes two hops. First one is File.ReadAllBytes() to get all the bytes in the file. Which doesn't try to translate anything, you get the raw data in the file so the weirdo line-endings are preserved as-is.

But that's bytes, you asked for a string. So second hop is to apply Encoding.GetString() to convert the bytes to a string. The one thing you have to do is pick the right Encoding class, the one that matches the encoding used by the program that wrote the file. Given that the file is pretty messed up if it contains \n\r\n sequences, and you didn't document anything else about the file, your best bet is to use Encoding.Default. Tweak as necessary.

like image 21
Hans Passant Avatar answered Oct 21 '22 09:10

Hans Passant