I have a unicode string like "𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊" and would like to convert it to the ASCII form "thug life".
I know I can achieve this in Python by
import unidecode
print(unidecode.unidecode('𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊'))
// thug life
However, this would asciify also other unicode characters (such as Chinese/Japanese characters, emojis, accented characters, etc.), which I want to preserve.
Is there a way to detect these type of "artistic" unicode characters?
Some more examples:
𝓽𝓱𝓾𝓰 𝓵𝓲𝓯𝓮
𝓉𝒽𝓊𝑔 𝓁𝒾𝒻𝑒
𝕥𝕙𝕦𝕘 𝕝𝕚𝕗𝕖
thug life
Thanks for your help!
import unicodedata
strings = [
'𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊',
'𝓽𝓱𝓾𝓰 𝓵𝓲𝓯𝓮',
'𝓉𝒽𝓊𝑔 𝓁𝒾𝒻𝑒',
'𝕥𝕙𝕦𝕘 𝕝𝕚𝕗𝕖',
'thug life']
for x in strings:
print(unicodedata.normalize( 'NFKC', x), x)
Output: .\62803325.py
thug life 𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊 thug life 𝓽𝓱𝓾𝓰 𝓵𝓲𝓯𝓮 thug life 𝓉𝒽𝓊𝑔 𝓁𝒾𝒻𝑒 thug life 𝕥𝕙𝕦𝕘 𝕝𝕚𝕗𝕖 thug life thug life
Resources:
unicodedata
— Unicode Database
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With