I am processing tweets from the Twitter Api, and a lot of the tweets have emojis. I'm trying to keep track of the most used emojis, but I'm having trouble actually identifying them.
I'm using: https://github.com/iamcal/emoji-data to identify emojis.
I have no idea how to figure out if a string contains an emoji or not. I have tried using regex with the emoji-data 'unified' field, I have tried just checking if the string contains that field. I'm really just not sure how to check for emojis.. Any help would be appreciated.
val pattern = new Regex("(${a.unified})")
(pattern findAllIn text).mkString(",")
This is what I have tried using regex. This doesn't find any emojis. I have also tried adding a \u before the unified fields from the emoji-data, but that doesn't help.
The best place to go to find out what an emoji is supposed to mean is a website called Emojipedia. This is a reference site that houses every single emoji you can use, including gender and skin tone duplicates. To get started, navigate to Emojipedia in the browser on your phone or computer.
Here's how it works. In the Unicode Standard, each emoji is represented as a "code point" (a hexadecimal number) that looks like U+1F063, for example. Thanks to Unicode, our devices all over the world can all agree that U+1F603 is the combination that triggers a grinning face.
Emojis look like images, or icons, but they are not. They are letters (characters) from the UTF-8 (Unicode) character set. UTF-8 covers almost all of the characters and symbols in the world.
You can use the following Regex to find emoji characters (and other characters outside the Unicode lingual plane):
[^\u0000-\uFFFF]
For example, we use the following code to filter out emojis from strings:
"some string".replaceAll("[^\u0000-\uFFFF]", "");
Hope that helps.
Your code is close to working. To extract the emojis from text
try:
"""\p{block=Emoticons}""".r.findAllIn(text).mkString
For example:
scala> val text = "Use regex and now you have two problems 😂 😆"
scala> """\p{block=Emoticons}""".r.findAllIn(text).mkString
res0: String = 😂😆
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With