c# MemoryStream Encoding Vs. Encoding.GetChars()

Question

I am trying to copy a byte stream from a database, encode it and finally display it on a web page. However, I am noticing different behavior encoding the content in different ways (note: I am using the "Western European" encoding which has a Latin character set and does not support chinese characters):

var encoding = Encoding.GetEncoding(1252 /*Western European*/);
using (var fileStream = new StreamReader(new MemoryStream(content), encoding))
{
    var str = fileStream.ReadToEnd();
}

Vs.

var encoding = Encoding.GetEncoding(1252 /*Western European*/);
var str = new string(encoding.GetChars(content));

If the content contains Chinese characters than the first block of code will produce a string like "D$教学而设计的", which is incorrect because the encoding shouldn't support those characters, while the second block will produce "D$æ•™å¦è€Œè®¾è®¡çš„" which is correct as those are all in the Western European character set.

What is the explanation for this difference in behavior?

SLaks · Accepted Answer

The StreamReader constructor will look for BOMs in the stream and set its encoding from them, even if you pass a different encoding.

It sees the UTF8 BOM in your data and correctly uses UTF8.

To prevent this behavior, pass false as the third parameter:

var fileStream = new StreamReader(new MemoryStream(content), encoding, false)

c# MemoryStream Encoding Vs. Encoding.GetChars()

Tags:

c#

character-encoding

streamreader

Sidawy

1 Answers

SLaks

Recent Activity

Donate For Us

c# MemoryStream Encoding Vs. Encoding.GetChars()

Tags:

c#

character-encoding

streamreader

Sidawy

1 Answers

SLaks

Related questions

Recent Activity

Donate For Us