Python

Question

I have an issue with one of my current weekend projects. I am writing a Python script that fetches some data from different sources and then spits everything out to an esc-pos printer. As you might imagine pos printers don't exactly like emojis...

So text like this:

可爱!!!!!!!!😍😍😍😍😍😍😍😝

gives me this character string:

'\u53ef\u7231!!!!!!!!\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f61d'

The result that comes out of the printer is quite different than what I would like of course. So I need to replace these non-ASCII characters with something else. I don't really care for the first characters, but I do care about emojis. Using something like: unidecode(str(text)) will at least strip them out, but I want to convert them to something more useful. Either into classic smilies like [:-D] or into [SMILING FACE WITH HEART-SHAPED EYES].

My problem is... how would one go about doing this? Manually creating a lookup table for most common emojis seems a bit tedious, so I am wondering if there is something else that I can do.

user3082900 · Accepted Answer

With the tip about unicodedata.name and some further research I managed to put this thing together:

import unicodedata
from unidecode import unidecode

def deEmojify(inputString):
    returnString = ""

    for character in inputString:
        try:
            character.encode("ascii")
            returnString += character
        except UnicodeEncodeError:
            replaced = unidecode(str(character))
            if replaced != '':
                returnString += replaced
            else:
                try:
                     returnString += "[" + unicodedata.name(character) + "]"
                except ValueError:
                     returnString += "[x]"

    return returnString

Basically it first tries to find the most appropriate ascii representation, if that fails it tries using the unicode name, and if even that fails it simply replaces it with some simple marker.

For example Taking this string:

abcdšeđfčgžhÅiØjÆk 可爱!!!!!!!!😍😍😍😍😍😍😍😝

And running the function:

string = u'abcdšeđfčgžhÅiØjÆk \u53ef\u7231!!!!!!!!\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f61d'
print(deEmojify(string))

Will produce the following result:

abcdsedfcgzhAiOjAEk[x] Ke Ai !!!!!!!![SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][FACE WITH STUCK-OUT TONGUE AND TIGHTLY-CLOSED EYES]

BoarGules · Answer

Try this

import unicodedata
print( unicodedata.name(u'\U0001f60d'))

result is

SMILING FACE WITH HEART-SHAPED EYES

Python - replace unicode emojis with ASCII characters

Tags:

unicode

emoji

user3082900

2 Answers

user3082900

BoarGules

Recent Activity

Donate For Us

Python - replace unicode emojis with ASCII characters

Tags:

python

unicode

emoji

user3082900

2 Answers

user3082900

BoarGules

Related questions

Recent Activity

Donate For Us