Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

extract all possible emoticons from a python list

Objective

I am trying to extract all possible emoticons from a unicode word list. I am using Python3 with anaconda installation, therefore I can not use a package such as emoji.py.

Here is a sample bow of word list.

lst = ['✅','türkçe','Çile','ısp','İst','ğ','some','#','@','@one','#thing','','1','41','ç','ö','⏱','⏱','👏','₺','€',':)',':/']

Expected output is like this:

out = ['✅','⏱', '⏱','👏']

Attempt 1

List comprehension to check if all chars are ASCII:

[w for w in lst if len(w) != len(w.encode())]

However, this is not giving the desired output because there are non ASCII letters in text. Also, currency symbols are not emoticons.

['✅', 'türkçe', 'Çile', 'ısp', 'İst', 'ğ', 'ç', 'ö', '⏱', '⏱', '👏', '₺', '€']

Attempt 2

Using NTLK emoticons regular expression

from nltk.tokenize.casual import EMOTICON_RE
EMOTICON_RE.findall(' '.join(lst))

However, EMOTICON_RE can only extract expressions such as :) :/ :(

Here is the list of what I am to considering to be emoticons.

I tried to build a list of emoticons to see if my word exists in that list, but I could not build a list of emoticons from unicode character codes.

Can you please suggest?

like image 276
moth Avatar asked Apr 20 '26 09:04

moth


1 Answers

I think that all of those characters are in Symbol, other category. Therefore you can do

[w for w in lst if any(c for c in w if unicodedata.category(c) == 'So')]
like image 157
pacholik Avatar answered Apr 21 '26 22:04

pacholik



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!