Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter out multiple emojis from Unicode text in Python [duplicate]

Let's say we have following strings containing emojis:

sent1 = 'šŸ˜‚ šŸ˜‚ right'
sent2 = 'Some text?! šŸ–‘šŸ˜‚šŸ˜‚šŸ˜‚šŸ˜‚'
sent3 = 'šŸ˜‚'

The task is to remove text and get the following output:

sent1_emojis = 'šŸ˜‚ šŸ˜‚ '
sent2_emojis = ' šŸ–‘šŸ˜‚šŸ˜‚šŸ˜‚šŸ˜‚'
sent3_emojis = 'šŸ˜‚' 

Based on past question (Regex Emoji Unicode) I use the following regex to identify strings that contain at least one emoji:

emoji_pattern = re.compile(u".*(["
u"\U0001F600-\U0001F64F"  # emoticons
u"\U0001F300-\U0001F5FF"  # symbols & pictographs
u"\U0001F680-\U0001F6FF"  # transport & map symbols
u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                "])+", flags= re.UNICODE)

In order to get the output string I use the following:

re.match(emoji_pattern, sent1).group(0)

and so on.

There's a problem with the sent2 string. re.match(emoji_pattern, sent1).group(0) returns the whole sent2 instead of emojis only.

like image 614
balkon16 Avatar asked May 03 '26 15:05

balkon16


1 Answers

Little change in emoji_pattern will do the job:

emoji_pattern = re.compile(u"(["                     # .* removed
u"\U0001F600-\U0001F64F"  # emoticons
u"\U0001F300-\U0001F5FF"  # symbols & pictographs
u"\U0001F680-\U0001F6FF"  # transport & map symbols
u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                "])", flags= re.UNICODE)             # + removed

for sent in [sent1, sent2, sent3]:
    print(''.join(re.findall(emoji_pattern, sent)))

šŸ˜‚šŸ˜‚
šŸ–‘šŸ˜‚šŸ˜‚šŸ˜‚šŸ˜‚
šŸ˜‚
like image 69
Chris Avatar answered May 06 '26 03:05

Chris