A
in UTF-8 is U+0041 LATIN CAPITAL LETTER A
. A
in ASCII is 065
.
How is UTF-8 is backwards-compatible with ASCII?
Why:
Because everything was already in ASCII and have a backwards compatible Unicode format made adoption much easier. It's much easier to convert a program to use UTF-8 than it is to UTF-16, and that program inherits the backwards compatible nature by still working with ASCII.
How:
ASCII is a 7 bit encoding, but is always stored in bytes, which are 8 bit. That means 1 bit has always been unused.
UTF-8 simply uses that extra bit to signify non-ASCII characters.
ASCII uses only the first 7 bits of an 8 bit byte. So all combinations from 00000000
to 01111111
. All 128 bytes in this range are mapped to a specific character.
UTF-8 keep these exact mappings. The character represented by 01101011
in ASCII is also represented by the same byte in UTF-8. All other characters are encoded in sequences of multiple bytes in which each byte has the highest bit set; i.e. every byte of all non-ASCII characters in UTF-8 is of the form 1xxxxxxx
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With