I'm working on getting a subset of emojis from a text retrieved form an API. What I'd like to do is substitute each emoji for its description or name.
I'm working on Python 3.4 and my current approach is accesing the unicode's name with unicodedata like this:
nname = unicodedata.name(my_unicode)
And I'm substituting with re.sub:
re.sub('[\U0001F602-\U0001F64F]', 'new string', str(orig_string))
I've tried re.search and then accessing matches and replacing strings (don't work with regex) but haven't been able to solve this.
Is there a way of getting a callback for each substitution that re.sub does? Any other route is also appreciated.
You can pass a callback function to re.sub: From the documentation:
re.sub(pattern, repl, string, count=0, flags=0)Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; [...] If repl is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string.
So just use unicodedata.name as the callback:
>>> my_text ="\U0001F602 and all of this \U0001F605"
>>> re.sub('[\U0001F602-\U0001F64F]', lambda m: unicodedata.name(m.group()), my_text)
'FACE WITH TEARS OF JOY and all of this SMILING FACE WITH OPEN MOUTH AND COLD SWEAT'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With