Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Utf8 is compatible with ascii

A in UTF-8 is U+0041 LATIN CAPITAL LETTER A. A in ASCII is 065.

How is UTF-8 is backwards-compatible with ASCII?

like image 939
Isara Rungvitayakul Avatar asked Apr 12 '13 07:04

Isara Rungvitayakul


Video Answer


2 Answers

Why:

Because everything was already in ASCII and have a backwards compatible Unicode format made adoption much easier. It's much easier to convert a program to use UTF-8 than it is to UTF-16, and that program inherits the backwards compatible nature by still working with ASCII.

How:

ASCII is a 7 bit encoding, but is always stored in bytes, which are 8 bit. That means 1 bit has always been unused.

UTF-8 simply uses that extra bit to signify non-ASCII characters.

like image 84
Pubby Avatar answered Jan 02 '23 21:01

Pubby


ASCII uses only the first 7 bits of an 8 bit byte. So all combinations from 00000000 to 01111111. All 128 bytes in this range are mapped to a specific character.

UTF-8 keep these exact mappings. The character represented by 01101011 in ASCII is also represented by the same byte in UTF-8. All other characters are encoded in sequences of multiple bytes in which each byte has the highest bit set; i.e. every byte of all non-ASCII characters in UTF-8 is of the form 1xxxxxxx.

like image 31
deceze Avatar answered Jan 02 '23 21:01

deceze