Filter out multiple emojis from Unicode text in Python [duplicate]

Question

Let's say we have following strings containing emojis:

sent1 = '😂 😂 right'
sent2 = 'Some text?! 🖑😂😂😂😂'
sent3 = '😂'

The task is to remove text and get the following output:

sent1_emojis = '😂 😂 '
sent2_emojis = ' 🖑😂😂😂😂'
sent3_emojis = '😂'

Based on past question (Regex Emoji Unicode) I use the following regex to identify strings that contain at least one emoji:

emoji_pattern = re.compile(u".*(["
u"\U0001F600-\U0001F64F"  # emoticons
u"\U0001F300-\U0001F5FF"  # symbols & pictographs
u"\U0001F680-\U0001F6FF"  # transport & map symbols
u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                "])+", flags= re.UNICODE)

In order to get the output string I use the following:

re.match(emoji_pattern, sent1).group(0)

and so on.

There's a problem with the sent2 string. re.match(emoji_pattern, sent1).group(0) returns the whole sent2 instead of emojis only.

Chris · Accepted Answer

Little change in emoji_pattern will do the job:

emoji_pattern = re.compile(u"(["                     # .* removed
u"\U0001F600-\U0001F64F"  # emoticons
u"\U0001F300-\U0001F5FF"  # symbols & pictographs
u"\U0001F680-\U0001F6FF"  # transport & map symbols
u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                "])", flags= re.UNICODE)             # + removed

for sent in [sent1, sent2, sent3]:
    print(''.join(re.findall(emoji_pattern, sent)))

😂😂
🖑😂😂😂😂
😂

Filter out multiple emojis from Unicode text in Python [duplicate]

Tags:

python

regex

unicode

emoji

balkon16

1 Answers

Chris

Recent Activity

Donate For Us

Filter out multiple emojis from Unicode text in Python [duplicate]

Tags:

python

regex

unicode

emoji

balkon16

1 Answers

Chris

Related questions

Recent Activity

Donate For Us