Are the first 128 characters of utf-8 and ascii identical?

Question

utf-8 table

Ascii table

Rob Napier · Accepted Answer

Yes. This was an intentional choice in the design of UTF-8 so that existing 7-bit ASCII would be compatible.

The encoding is also designed intentionally so that 7-bit ASCII values cannot mean anything except their ASCII equivalent. For example, in UTF-16, the Euro symbol (€) is encoded as 0x20 0xAC. But 0x20 is SPACE in ASCII. So if an ASCII-only algorithm tries to space-delimit a string like "€ 10" encoded in UTF-16, it'll corrupt the data.

This can't happen in UTF-8. € is encoded there as 0xE2 0x82 0xAC, none of which are legal 7-bit ASCII values. So an ASCII algorithm that naively splits on the ASCII SPACE (0x20) will still work, even though it doesn't know anything about UTF-8 encoding. (The same is true for any ASCII character like slash, comma, backslash, percent, etc.) UTF-8 is an incredibly clever text encoding.

Are the first 128 characters of utf-8 and ascii identical?

Tags:

encoding

ascii

utf-8

Sebastian Nielsen

1 Answers

Rob Napier

Recent Activity

Donate For Us

Are the first 128 characters of utf-8 and ascii identical?

Tags:

encoding

ascii

utf-8

Sebastian Nielsen

1 Answers

Rob Napier

Related questions

Recent Activity

Donate For Us