I have DataFrame in Python Pandas like below:
sentence
------------
šš¤š¾
I like it
+1šš
One :-) :)
hah
I need to select only rows containing emoticons or emojis, so as a result I need something like below:
sentence
------------
šš¤š¾
+1šš
One :-) :)
How can I do that in Python ?
You can select the unicode emojis with a regex range:
df2 = df[df['sentence'].str.contains(r'[\u263a-\U0001f645]')]
output:
sentence
0 šš¤š¾
2 +1šš
This is however much more ambiguous for the ASCII "emojis" as there is no standard definition and probably endless combinations. If you limit it to the smiley faces that contain eyes ';:' and a mouth ')(' you could use:
df[df['sentence'].str.contains(r'[\u263a-\U0001f645]|(?:[:;]\S?[\)\(])')]
output:
sentence
0 šš¤š¾
2 +1šš
3 One :-) :)
But you would be missing plenty of potential ASCII possibilities: :O
, :P
, 8D
, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With