Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to search emoticon/emoji in elasticsearch?

I am trying to search emoticon/emoji containing text in elasticsearch. Earlier, I have inserted tweets in ES. Now I want to search for example smile or sad faces related tweets. I tried the following

1) used equivalent of unicode values of smile, but didn't work. No results were returned.

GET /myindex/twitter_stream/_search
{
  "query": {
    "match": {
      "text": "\u1f603"
    }
  }
}

How to set up emoji search in elasticsearch? Do, I have to encode raw tweets before ingesting into elasticsearch? What would be the query ? Any experienced approaches? Thanks.

like image 461
Gautam S. Thakur Avatar asked Jan 05 '16 18:01

Gautam S. Thakur


People also ask

Can you search with Emojis?

Now that has changed — You can now search on Google desktop or mobile using emoji characters.

How do you type emoticons?

Simply click on any text field, then press Command + Control + Space on your keyboard. A panel will open, and you can choose your emoji from the list. A simple click will add the emoji to your text.

Is there an emoticon for Agree?

agreement or approval (➕, 👍) kudos (👏, 🙌) understanding (👌)


2 Answers

The specification explain how to search for emoji:

Searching includes both searching for emoji characters in queries, and finding emoji characters in the target. These are most useful when they include the annotations as synonyms or hints. For example, when someone searches for ⛽︎ on yelp.com, they see matches for “gas station”. Conversely, searching for “gas pump” in a search engine could find pages containing ⛽︎.

Annotations are language-specific: searching on yelp.de, someone would expect a search for ⛽︎ to result in matches for “Tankstelle”.

You can keep the real unicode char, and expand it to it annotation in each language you aim to support.

This can be done with a synonym filter. But Elasticsearch standard tokenizer will remove the emoji, so there is quite a lot of work to do:

  • remove emoji modifier, clean everything up;
  • tokenize via whitespace;
  • remove undesired punctuation;
  • expand the emoji to their synonyms.

The whole process is described here: http://jolicode.com/blog/search-for-emoji-with-elasticsearch (disclaimer: I'm the author).

like image 143
Damien Avatar answered Oct 11 '22 08:10

Damien


The way I have seen emoticons work is actually a string is stored in place of there image counterparts when you are storing them in a database. For eg. A smile is stored as :smile:. You can verify that in your case. If this is the case, you can add a custom tokenizer which does not tokenize on colons so that an exact match for the emoticons can be made. Then while searching you just need to convert the emoticon image in search to appropriate string and elasticsearch will be able to find it. Hope it helps

like image 2
udit mittal Avatar answered Oct 11 '22 06:10

udit mittal