I have a unicode string like "๐๐๐๐ ๐๐๐๐" and would like to convert it to the ASCII form "thug life".
I know I can achieve this in Python by
import unidecode
print(unidecode.unidecode('๐๐๐๐ ๐๐๐๐'))
// thug life
However, this would asciify also other unicode characters (such as Chinese/Japanese characters, emojis, accented characters, etc.), which I want to preserve.
Is there a way to detect these type of "artistic" unicode characters?
Some more examples:
๐ฝ๐ฑ๐พ๐ฐ ๐ต๐ฒ๐ฏ๐ฎ
๐๐ฝ๐๐ ๐๐พ๐ป๐
๐ฅ๐๐ฆ๐ ๐๐๐๐
๏ฝ๏ฝ๏ฝ๏ฝ ๏ฝ๏ฝ๏ฝ๏ฝ
Thanks for your help!
import unicodedata
strings = [
'๐๐๐๐ ๐๐๐๐',
'๐ฝ๐ฑ๐พ๐ฐ ๐ต๐ฒ๐ฏ๐ฎ',
'๐๐ฝ๐๐ ๐๐พ๐ป๐',
'๐ฅ๐๐ฆ๐ ๐๐๐๐',
'๏ฝ๏ฝ๏ฝ๏ฝ ๏ฝ๏ฝ๏ฝ๏ฝ
']
for x in strings:
print(unicodedata.normalize( 'NFKC', x), x)
Output: .\62803325.py
thug life ๐๐๐๐ ๐๐๐๐ thug life ๐ฝ๐ฑ๐พ๐ฐ ๐ต๐ฒ๐ฏ๐ฎ thug life ๐๐ฝ๐๐ ๐๐พ๐ป๐ thug life ๐ฅ๐๐ฆ๐ ๐๐๐๐ thug life ๏ฝ๏ฝ๏ฝ๏ฝ ๏ฝ๏ฝ๏ฝ๏ฝ
Resources:
unicodedata โ Unicode Database
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With