Substitute Emoji with its description or name

Question

I'm working on getting a subset of emojis from a text retrieved form an API. What I'd like to do is substitute each emoji for its description or name.

I'm working on Python 3.4 and my current approach is accesing the unicode's name with unicodedata like this:

nname = unicodedata.name(my_unicode)

And I'm substituting with re.sub:

re.sub('[\U0001F602-\U0001F64F]', 'new string', str(orig_string))

I've tried re.search and then accessing matches and replacing strings (don't work with regex) but haven't been able to solve this.

Is there a way of getting a callback for each substitution that re.sub does? Any other route is also appreciated.

tobias_k · Accepted Answer

You can pass a callback function to re.sub: From the documentation:

re.sub(pattern, repl, string, count=0, flags=0)

Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; [...] If repl is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string.

So just use unicodedata.name as the callback:

>>> my_text ="\U0001F602  and all of this \U0001F605"
>>> re.sub('[\U0001F602-\U0001F64F]', lambda m: unicodedata.name(m.group()), my_text)
'FACE WITH TEARS OF JOY  and all of this SMILING FACE WITH OPEN MOUTH AND COLD SWEAT'

Substitute Emoji with its description or name

Tags:

python

regex

python-3.x

unicode

Jose Torres

1 Answers

tobias_k

Recent Activity

Donate For Us

Substitute Emoji with its description or name

Tags:

python

regex

python-3.x

unicode

Jose Torres

1 Answers

tobias_k

Related questions

Recent Activity

Donate For Us