I need to add a UTF-8 byte-order-mark to generated text data on client side. How do I do that?
Using new Blob(['\xEF\xBB\xBF' + content])
yields '"my data"'
, of course.
Neither did '\uBBEF\x22BF'
work (with '\x22' == '"'
being the next character in content
).
Is it possible to prepend the UTF-8 BOM in JavaScript to a generated text?
Yes, I really do need the UTF-8 BOM in this case.
To Add BOM to a UTF-8 file, we can directly write Unicode \ufeff or three bytes 0xEF , 0xBB , 0xBF at the beginning of the UTF-8 file. The Unicode \ufeff represents 0xEF , 0xBB , 0xBF , read this.
The Unicode Standard permits the BOM in UTF-8, but does not require or recommend its use. Byte order has no meaning in UTF-8, so its only use in UTF-8 is to signal at the start that the text stream is encoded in UTF-8, or that it was converted to UTF-8 from a stream that contained an optional BOM.
Select “Save As” from File menu, go to Save button and open its dropdown menu, select “Save with Encoding…”, choose “Unicode (UTF-8 without signature)”.
Bookmark this question. Show activity on this post. Closed 2 years ago.
Prepend \ufeff
to the string. See http://msdn.microsoft.com/en-us/library/ie/2yfce773(v=vs.94).aspx
See discussion between @jeff-fischer and @casey for details on UTF-8 and UTF-16 and the BOM. What actually makes the above work is that the string \ufeff
is always used to represent the BOM, regardless of UTF-8 or UTF-16 being used.
See p.36 in The Unicode Standard 5.0, Chapter 2 for a detailed explanation. A quote from that page
The endian order entry for UTF-8 in Table 2-4 is marked N/A because UTF-8 code units are 8 bits in size, and the usual machine issues of endian order for larger code units do not apply. The serialized order of the bytes must not depart from the order defined by the UTF- 8 encoding form. Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature.
I had the same issue and this is the solution I came up with:
var blob = new Blob([ new Uint8Array([0xEF, 0xBB, 0xBF]), // UTF-8 BOM "Text", ... // Remaining data ], { type: "text/plain;charset=utf-8" });
Using Uint8Array
prevents the browser from converting those bytes into string (tested on Chrome and Firefox).
You should replace text/plain
with your desired MIME type.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With