Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the encoding of the string get from StreamReader.ReadLine()

First, let's see the code:

//The encoding of utf8.txt is UTF-8
StreamReader reader = new StreamReader(@"C:\\utf8.txt", Encoding.UTF8, true);
while (reader.Peek() > 0)
{
    //What is the encoding of lineFromTxtFile?
    string lineFromTxtFile = reader.ReadLine();
}

As Joel said in his famous article:

If you have a string, in memory, in a file, or in an email message, you have to know what encoding it is in or you cannot interpret it or display it to users correctly."

So here comes my question: what is the encoding of the string lineFromTxtFile? UTF-8(because it is from a text file encoded in UTF-8)? or UTF-16(because string in .NET is "Unicode"(UTF-16))?

Thanks.

like image 846
jjooeell Avatar asked Nov 11 '11 03:11

jjooeell


People also ask

What does StreamReader return?

StreamReader. ReadLine() method reads a line of characters from the current stream and returns the data as a string. StreamReader. ReadLineAsync() method reads a line of characters from the current stream asynchronously and returns the data as a string.

What does StreamReader mean in C#?

StreamReader is designed for character input in a particular encoding, whereas the Stream class is designed for byte input and output. Use StreamReader for reading lines of information from a standard text file.

Is StreamReader thread safe?

By default, a StreamReader is not thread safe.


3 Answers

All .Net string variables are encoded with Encoding.Unicode (UTF-16, little endian). Even better, because you know your text file is utf-8 and told your streamreader the correct encoding in the constructor, any special characters will be handled correctly.

like image 69
Joel Coehoorn Avatar answered Oct 06 '22 01:10

Joel Coehoorn


.NET strings are Unicode. Encoding doesn't play a part, then until you need to use it next. If you go to write it out to a file, for example, then you will specify the output encoding. But since .NET handles everything you do with the string via library calls, it doesn't matter how it's represented in memory.

like image 42
Jonathon Reinhart Avatar answered Oct 06 '22 01:10

Jonathon Reinhart


It would be Unicode, because all .NET strings are. Real question: why does it matter?

like image 43
Ilia G Avatar answered Oct 06 '22 01:10

Ilia G