I'm looking for a table which contains ASCII characters and same looking UTF8 characters. I know it also depends on the font is they look the same, but something generic to start with is enough.
>>> # PY3 code:
>>> a='H' # ascii
>>> b='Н' # utf8
>>> a==b
False
>>> ' '.join(format(ord(x), 'b') for x in a)
'1001000'
>>> ' '.join(format(ord(x), 'b') for x in b)
'10000011101'
>>> a='P' # ascii
>>> b='Ρ' # utf8
>>> a==b
False
>>> ' '.join(format(ord(x), 'b') for x in a)
'1010000'
>>> ' '.join(format(ord(x), 'b') for x in b)
'1110100001'
For characters represented by the 7-bit ASCII character codes, the UTF-8 representation is exactly equivalent to ASCII, allowing transparent round trip migration. Other Unicode characters are represented in UTF-8 by sequences of up to 6 bytes, though most Western European characters require only 2 bytes3.
UTF-8 is backward-compatible with ASCII and can represent any standard Unicode character. The first 128 UTF-8 characters precisely match the first 128 ASCII characters (numbered 0-127), meaning that existing ASCII text is already valid UTF-8. All other characters use two to four bytes.
Why did UTF-8 replace the ASCII character-encoding standard? UTF-8 can store a character in more than one byte. UTF-8 replaced the ASCII character-encoding standard because it can store a character in more than a single byte. This allowed us to represent a lot more character types, like emoji.
This is very useful tool as it will show you all characters which look similar and you can choose if this is REALLY similar enough for you :)
https://unicode.org/cldr/utility/confusables.jsp?a=test&r=None
Some other resources:
This is called Visual Spoofing
Python Package to detect confusables
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With