Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select only rows containing emojis and emoticons in Python?

I have DataFrame in Python Pandas like below:

sentence
------------
šŸ˜ŽšŸ¤˜šŸ¾
I like it
+1šŸ˜šŸ˜˜
One :-) :)
hah

I need to select only rows containing emoticons or emojis, so as a result I need something like below:

sentence
------------
šŸ˜ŽšŸ¤˜šŸ¾
+1šŸ˜šŸ˜˜
One :-) :)

How can I do that in Python ?

like image 936
dingaro Avatar asked Oct 19 '25 02:10

dingaro


1 Answers

You can select the unicode emojis with a regex range:

df2 = df[df['sentence'].str.contains(r'[\u263a-\U0001f645]')]

output:

  sentence
0      šŸ˜ŽšŸ¤˜šŸ¾
2     +1šŸ˜šŸ˜˜

This is however much more ambiguous for the ASCII "emojis" as there is no standard definition and probably endless combinations. If you limit it to the smiley faces that contain eyes ';:' and a mouth ')(' you could use:

df[df['sentence'].str.contains(r'[\u263a-\U0001f645]|(?:[:;]\S?[\)\(])')]

output:

     sentence
0         šŸ˜ŽšŸ¤˜šŸ¾
2        +1šŸ˜šŸ˜˜
3  One :-) :)

But you would be missing plenty of potential ASCII possibilities: :O, :P, 8D, etc.

like image 186
mozway Avatar answered Oct 20 '25 15:10

mozway



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!