Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why would I use a Unicode Signature Byte-Order-Mark (BOM)?

Are these obsolete? They seem like the worst idea ever -- embed something in the contents of your file that no one can see, but impacts the file's functionality. I don't understand why I would want one.

like image 892
Pup Avatar asked Jun 25 '09 19:06

Pup


People also ask

What is the purpose of the byte order mark?

The byte order mark (BOM) is a piece of information used to signify that a text file employs Unicode encoding, while also communicating the text stream's endianness. The BOM is not interpreted as a logical part of the text stream itself, but is rather an invisible indicator at its head.

What is BOM signature?

The UTF-8 file signature (commonly also called a "BOM") identifies the encoding format rather than the byte order of the document. UTF-8 is a linear sequence of bytes and not sequence of 2-byte or 4-byte units where the byte order is important. Encoding. Encoded BOM.

What is Unicode and bytes?

Unicode can be stored using several different encodings, which translate the character codes into sequences of bytes. The Unicode standard defines three and several other encodings exist, all in practice variable-length encodings.


1 Answers

The "BOM" is a holdover from the early days of Unicode when it was assumed that using Unicode would mean using 16-bit characters. It is completely pointless in an encoding like UTF-8 which has only one byte order. The choice of U+FEFF is also suboptimal for UTF-32, because it cannot distinguish between all possible middle-endian byte orders (to do so would require a BOM encoded with 4 different bytes).

The only reason you'd use one is when sending UTF-16 or UTF-32 data between platforms with different byte orders, but (1) most people use UTF-8 anyway, and (2) the MIME charset parameter provides a better mechanism.

like image 61
dan04 Avatar answered Jun 07 '23 00:06

dan04