Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Android - How to filter emoji (emoticons) from a string?

Tags:

android

emoji

I'm working on an Android app, and I do not want people to use emoji in the input.

How can I remove emoji characters from a string?

like image 781
Jochem Kuijpers Avatar asked Mar 04 '14 17:03

Jochem Kuijpers


People also ask

How do you exclude emojis?

TouchRetouch is an app that is for object removal . You can remove any kind of object, smiley, emoji, or stickers from a photo with this app easily. It is also available on iOS and Android. By using TouchRetouch, you can just mark the area of the object, sticker, or emoji and tap on the erase button.

How do I block emojis on Android?

Select the virtual keyboard you're using (like Gboard, and not “Google voice typing”) and then Preferences. (There's a shortcut to this location, too: With virtual keyboard displayed, tap and hold on the comma [,] key until you see a small Settings gear appear.) Now, disable the option “Show emoji switch key.”


2 Answers

Emojis can be found in the following ranges (source) :

  • U+2190 to U+21FF
  • U+2600 to U+26FF
  • U+2700 to U+27BF
  • U+3000 to U+303F
  • U+1F300 to U+1F64F
  • U+1F680 to U+1F6FF

You can use this line in your script to filter them all at once:

text.replace("/[\u2190-\u21FF]|[\u2600-\u26FF]|[\u2700-\u27BF]|[\u3000-\u303F]|[\u1F300-\u1F64F]|[\u1F680-\u1F6FF]/g", "");

like image 184
Faez Mehrabani Avatar answered Nov 12 '22 13:11

Faez Mehrabani


Latest emoji data can be found here:

http://unicode.org/Public/emoji/

There is a folder named with emoji version. As app developers a good idea is to use latest version available.

When You look inside a folder, You'll see text files in it. You should check emoji-data.txt. It contains all standard emoji codes.

There are a lot of small symbol code ranges for emoji. Best support will be to check all these in Your app.

Some people ask why there are 5 digit codes when we can only specify 4 after \u. Well these are codes made from surrogate pairs. Usually 2 symbols are used to encode one emoji.

For example, we have a string.

String s = ...;

UTF-16 representation

byte[] utf16 = s.getBytes("UTF-16BE");

Iterate over UTF-16

for(int i = 0; i < utf16.length; i += 2) {

Get one char

char c = (char)((char)(utf16[i] & 0xff) << 8 | (char)(utf16[i + 1] & 0xff));

Now check for surrogate pairs. Emoji are located on the first plane, so check first part of pair in range 0xd800..0xd83f.

if(c >= 0xd800 && c <= 0xd83f) {
    high = c;
    continue;
}

For second part of surrogate pair range is 0xdc00..0xdfff. And we can now convert a pair to one 5 digit code.

else if(c >= 0xdc00 && c <= 0xdfff) {
    low = c;
    long unicode = (((long)high - 0xd800) * 0x400) + ((long)low - 0xdc00) + 0x10000;
}

All other symbols are not pairs so process them as is.

else {
    long unicode = c;
}

Now use data from emoji-data.txt to check if it's emoji. If it is, then skip it. If not then copy bytes to output byte array.

Finally byte array is converted to String by

String out = new String(outarray, Charset.forName("UTF-16BE"));
like image 34
NoAngel Avatar answered Nov 12 '22 12:11

NoAngel