First, let's see the code:
//The encoding of utf8.txt is UTF-8
StreamReader reader = new StreamReader(@"C:\\utf8.txt", Encoding.UTF8, true);
while (reader.Peek() > 0)
{
//What is the encoding of lineFromTxtFile?
string lineFromTxtFile = reader.ReadLine();
}
As Joel said in his famous article:
If you have a string, in memory, in a file, or in an email message, you have to know what encoding it is in or you cannot interpret it or display it to users correctly."
So here comes my question: what is the encoding of the string lineFromTxtFile? UTF-8(because it is from a text file encoded in UTF-8)? or UTF-16(because string in .NET is "Unicode"(UTF-16))?
Thanks.
StreamReader. ReadLine() method reads a line of characters from the current stream and returns the data as a string. StreamReader. ReadLineAsync() method reads a line of characters from the current stream asynchronously and returns the data as a string.
StreamReader is designed for character input in a particular encoding, whereas the Stream class is designed for byte input and output. Use StreamReader for reading lines of information from a standard text file.
By default, a StreamReader is not thread safe.
All .Net string variables are encoded with Encoding.Unicode (UTF-16, little endian). Even better, because you know your text file is utf-8 and told your streamreader the correct encoding in the constructor, any special characters will be handled correctly.
.NET strings are Unicode. Encoding doesn't play a part, then until you need to use it next. If you go to write it out to a file, for example, then you will specify the output encoding. But since .NET handles everything you do with the string via library calls, it doesn't matter how it's represented in memory.
It would be Unicode, because all .NET strings are. Real question: why does it matter?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With