I am trying to search emoticon/emoji containing text in elasticsearch. Earlier, I have inserted tweets in ES. Now I want to search for example smile or sad faces related tweets. I tried the following
1) used equivalent of unicode values of smile, but didn't work. No results were returned.
GET /myindex/twitter_stream/_search
{
"query": {
"match": {
"text": "\u1f603"
}
}
}
How to set up emoji search in elasticsearch? Do, I have to encode raw tweets before ingesting into elasticsearch? What would be the query ? Any experienced approaches? Thanks.
Now that has changed — You can now search on Google desktop or mobile using emoji characters.
Simply click on any text field, then press Command + Control + Space on your keyboard. A panel will open, and you can choose your emoji from the list. A simple click will add the emoji to your text.
agreement or approval (➕, 👍) kudos (👏, 🙌) understanding (👌)
The specification explain how to search for emoji:
Searching includes both searching for emoji characters in queries, and finding emoji characters in the target. These are most useful when they include the annotations as synonyms or hints. For example, when someone searches for ⛽︎ on yelp.com, they see matches for “gas station”. Conversely, searching for “gas pump” in a search engine could find pages containing ⛽︎.
Annotations are language-specific: searching on yelp.de, someone would expect a search for ⛽︎ to result in matches for “Tankstelle”.
You can keep the real unicode char, and expand it to it annotation in each language you aim to support.
This can be done with a synonym filter. But Elasticsearch standard tokenizer will remove the emoji, so there is quite a lot of work to do:
The whole process is described here: http://jolicode.com/blog/search-for-emoji-with-elasticsearch (disclaimer: I'm the author).
The way I have seen emoticons work is actually a string is stored in place of there image counterparts when you are storing them in a database. For eg. A smile is stored as :smile:. You can verify that in your case. If this is the case, you can add a custom tokenizer which does not tokenize on colons so that an exact match for the emoticons can be made. Then while searching you just need to convert the emoticon image in search to appropriate string and elasticsearch will be able to find it. Hope it helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With