Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert fancy/artistic unicode text to ASCII

I have a unicode string like "𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊" and would like to convert it to the ASCII form "thug life".

I know I can achieve this in Python by

import unidecode
print(unidecode.unidecode('𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊'))
// thug life

However, this would asciify also other unicode characters (such as Chinese/Japanese characters, emojis, accented characters, etc.), which I want to preserve.

Is there a way to detect these type of "artistic" unicode characters?

Some more examples:

𝓽𝓱𝓾𝓰 𝓵𝓲𝓯𝓮

𝓉𝒽𝓊𝑔 𝓁𝒾𝒻𝑒

𝕥𝕙𝕦𝕘 𝕝𝕚𝕗𝕖

thug life

Thanks for your help!

like image 987
Martin Avatar asked Jul 08 '20 20:07

Martin


1 Answers

import unicodedata
strings = [
  '𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊',
  '𝓽𝓱𝓾𝓰 𝓵𝓲𝓯𝓮',
  '𝓉𝒽𝓊𝑔 𝓁𝒾𝒻𝑒',
  '𝕥𝕙𝕦𝕘 𝕝𝕚𝕗𝕖',
  'thug life']
for x in strings:
  print(unicodedata.normalize( 'NFKC', x), x)

Output: .\62803325.py

thug life 𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊
thug life 𝓽𝓱𝓾𝓰 𝓵𝓲𝓯𝓮
thug life 𝓉𝒽𝓊𝑔 𝓁𝒾𝒻𝑒
thug life 𝕥𝕙𝕦𝕘 𝕝𝕚𝕗𝕖
thug life thug life

Resources:

  • unicodedata — Unicode Database
  • Normalization forms for Unicode text
like image 85
JosefZ Avatar answered Sep 30 '22 10:09

JosefZ