What is the difference between Big Endian and Little Endian Byte order ?
Both of these seem to be related to Unicode and UTF16. Where exactly do we use this?
Little Endian Byte Order: The least significant byte (the "little end") of the data is placed at the byte with the lowest address. The rest of the data is placed in order in the next three bytes in memory. In these definitions, the data, a 32-bit pattern, is regarded as a 32-bit unsigned integer.
The benefit of little endianness is that a variable can be read as any length using the same address. One benefit of big-endian is that you can read 16-bit and 32-bit values as most humans do; from left to right.
Big endian machine: Stores data big-end first. When looking at multiple bytes, the first byte (lowest address) is the biggest. Little endian machine: Stores data little-end first. When looking at multiple bytes, the first byte is smallest.
Little and big endian are two ways of storing multibyte data-types ( int, float, etc). In little endian machines, last byte of binary representation of the multibyte data-type is stored first. On the other hand, in big endian machines, first byte of binary representation of the multibyte data-type is stored first.
Big-Endian (BE) / Little-Endian (LE) are two ways to organize multi-byte words. For example, when using two bytes to represent a character in UTF-16, there are two ways to represent the character 0x1234
as a string of bytes (0x00-0xFF):
Byte Index: 0 1 --------------------- Big-Endian: 12 34 Little-Endian: 34 12
In order to decide if a text uses UTF-16BE or UTF-16LE, the specification recommends to prepend a Byte Order Mark (BOM) to the string, representing the character U+FEFF. So, if the first two bytes of a UTF-16 encoded text file are FE
, FF
, the encoding is UTF-16BE. For FF
, FE
, it is UTF-16LE.
A visual example: The word "Example" in different encodings (UTF-16 with BOM):
Byte Index: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ------------------------------------------------------------ ASCII: 45 78 61 6d 70 6c 65 UTF-16BE: FE FF 00 45 00 78 00 61 00 6d 00 70 00 6c 00 65 UTF-16LE: FF FE 45 00 78 00 61 00 6d 00 70 00 6c 00 65 00
For further information, please read the Wikipedia page of Endianness and/or UTF-16.
Ferdinand's answer (and others) are correct, but incomplete.
Big Endian (BE) / Little Endian (LE) have nothing to do with UTF-16 or UTF-32. They existed way before Unicode, and affect how the bytes of numbers get stored in the computer's memory. They depend on the processor.
If you have a number with the value 0x12345678
then in memory it will be represented as 12 34 56 78
(BE) or 78 56 34 12
(LE).
UTF-16 and UTF-32 happen to be represented on 2 respectively 4 bytes, so the order of the bytes respects the ordering that any number follows on that platform.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With