i am writing text to a TextWriter
. i want the UTF-16 Byte Order Mark (BOM
) to appear in the output:
public void ProcessRequest(HttpContext context)
{
context.Response.ContentEncoding = new UnicodeEncoding(true, true);
WriteStuffToTextWriter(context.Response.Output);
}
Except the output doesn't contain a byte order mark:
HTTP/1.1 200 OK
Server: ASP.NET Development Server/10.0.0.0
Date: Thu, 06 Sep 2012 21:09:23 GMT
X-AspNet-Version: 4.0.30319
Content-Disposition: attachment; filename="Transactions_Calendar_20120906.csv"
Cache-Control: private
Content-Type: text/csv; filename="Transactions_Calendar_20120906.csv"; charset=utf-16BE
Content-Length: 95022
Connection: Close
JobName,ShiftName,6////09////2012 12::::00::::00 АΜ,...
How do i tell a TextWriter
to write the encoding marker?
Note: The 2nd paramter in UnicodeEncoding
:
context.Response.ContentEncoding = new UnicodeEncoding(true, true);
byteOrderMark
Type:System.Boolean
true to specify that a Unicode byte order mark is provided; otherwise, false.
A: Yes. Since UTF-8 is interpreted as a sequence of bytes, there is no endian problem as there is for encoding forms that use 16-bit or 32-bit code units.
The byte-order mark indicates which order is used, so that applications can immediately decode the content. In the UTF-8 encoding, the presence of the BOM is not essential because, unlike the UTF-16 encodings, there is no alternative sequence of bytes in a character.
1. From The Unicode Standard 5.0: The Unicode Standard also specifies the use of an initial byte order mark (BOM) to explicitly differentiate big-endian or little endian data in some of the Unicode encoding schemes.
String zwnbsp = "\xfeff"; //Zero-width non-breaking space
//The Zero-width non-breaking space character ***is*** the Byte-Order-Mark (BOM).
String s = zwnbsp+"The quick brown fox jumped over the lazy dog.";
writer.Write(s);
At some point i realized how simple the solution is.
i used to think that the Unicode Byte-Order-Mark was some special signature. i used to think i had to carefully decide which byte sequence i wanted to output, in order to output the correct BOM:
But since then i realized that byte Byte-Order-Mark is not some special byte sequence that you have to prepend to your file.
The BOM is just a Unicode character. You don't output any bytes; you only output character U+FEFF
. The very act of writing that character, the serializer will convert it to whatever encoding you're using for you.
The character U+feff
(ZERO WIDTH NO-BREAK SPACE
) was chosen for good reason. It's a space, so it has no meaning, and it is zero width, so you shouldn't even see it.
That means that my question is fundamentally flawed. There is no such thing as "writing a byte-order-mark". You just make sure the first character you write out is U+FEFF
. In my case i am writing to a TextWriter
:
void WriteStuffToTextWriter(TextWriter writer)
{
String csvExport = GetExportAsCSV();
writer.Write("\xfeff"); //Output unicode charcter U+FEFF as a byte order marker
writer.Write(csvExport);
}
The TextWriter
will handle converting the unicode character U+feff
into whatever byte encoding it has been configured to use.
Note: Any code is released into the public domain. No attribution required.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With