Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert fancy/artistic unicode text to ASCII

I have a unicode string like "๐–™๐–๐–š๐–Œ ๐–‘๐–Ž๐–‹๐–Š" and would like to convert it to the ASCII form "thug life".

I know I can achieve this in Python by

import unidecode
print(unidecode.unidecode('๐–™๐–๐–š๐–Œ ๐–‘๐–Ž๐–‹๐–Š'))
// thug life

However, this would asciify also other unicode characters (such as Chinese/Japanese characters, emojis, accented characters, etc.), which I want to preserve.

Is there a way to detect these type of "artistic" unicode characters?

Some more examples:

๐“ฝ๐“ฑ๐“พ๐“ฐ ๐“ต๐“ฒ๐“ฏ๐“ฎ

๐“‰๐’ฝ๐“Š๐‘” ๐“๐’พ๐’ป๐‘’

๐•ฅ๐•™๐•ฆ๐•˜ ๐•๐•š๐•—๐•–

๏ฝ”๏ฝˆ๏ฝ•๏ฝ‡ ๏ฝŒ๏ฝ‰๏ฝ†๏ฝ…

Thanks for your help!

like image 987
Martin Avatar asked Jul 08 '20 20:07

Martin


1 Answers

import unicodedata
strings = [
  '๐–™๐–๐–š๐–Œ ๐–‘๐–Ž๐–‹๐–Š',
  '๐“ฝ๐“ฑ๐“พ๐“ฐ ๐“ต๐“ฒ๐“ฏ๐“ฎ',
  '๐“‰๐’ฝ๐“Š๐‘” ๐“๐’พ๐’ป๐‘’',
  '๐•ฅ๐•™๐•ฆ๐•˜ ๐•๐•š๐•—๐•–',
  '๏ฝ”๏ฝˆ๏ฝ•๏ฝ‡ ๏ฝŒ๏ฝ‰๏ฝ†๏ฝ…']
for x in strings:
  print(unicodedata.normalize( 'NFKC', x), x)

Output: .\62803325.py

thug life ๐–™๐–๐–š๐–Œ ๐–‘๐–Ž๐–‹๐–Š
thug life ๐“ฝ๐“ฑ๐“พ๐“ฐ ๐“ต๐“ฒ๐“ฏ๐“ฎ
thug life ๐“‰๐’ฝ๐“Š๐‘” ๐“๐’พ๐’ป๐‘’
thug life ๐•ฅ๐•™๐•ฆ๐•˜ ๐•๐•š๐•—๐•–
thug life ๏ฝ”๏ฝˆ๏ฝ•๏ฝ‡ ๏ฝŒ๏ฝ‰๏ฝ†๏ฝ…

Resources:

  • unicodedata โ€” Unicode Database
  • Normalization forms for Unicode text
like image 85
JosefZ Avatar answered Sep 30 '22 10:09

JosefZ



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!