Are the first 128 characters of utf-8 and ascii identical?
utf-8 table
Ascii table
Yes. This was an intentional choice in the design of UTF-8 so that existing 7-bit ASCII would be compatible.
The encoding is also designed intentionally so that 7-bit ASCII values cannot mean anything except their ASCII equivalent. For example, in UTF-16, the Euro symbol (€) is encoded as 0x20 0xAC. But 0x20 is SPACE in ASCII. So if an ASCII-only algorithm tries to space-delimit a string like "€ 10" encoded in UTF-16, it'll corrupt the data.
This can't happen in UTF-8. € is encoded there as 0xE2 0x82 0xAC, none of which are legal 7-bit ASCII values. So an ASCII algorithm that naively splits on the ASCII SPACE (0x20) will still work, even though it doesn't know anything about UTF-8 encoding. (The same is true for any ASCII character like slash, comma, backslash, percent, etc.) UTF-8 is an incredibly clever text encoding.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With