Remove ✅, 🔥, ✈ , ♛ and other such emojis/images/signs from Java strings

People also ask

How do I remove emoji strings?

Instead of removing Emoji characters, you can only include alphabets and numbers. A simple tr should do the trick, . tr('^A-Za-z0-9', '') .

How do you use emoji codes in Java?

If you mean e.g. 😀 'GRINNING FACE' (U+1F600), then write "😀" if your source code is UTF-8, or "\uD83D\uDE00" if not.

Instead of blacklisting some elements, how about creating a whitelist of the characters you do wish to keep? This way you don't need to worry about every new emoji being added.

String characterFilter = "[^\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{Cf}\\p{Cs}\\s]";
String emotionless = aString.replaceAll(characterFilter,"");

So:

[\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{Cf}\\p{Cs}\\s] is a range representing all numeric (\\p{N}), letter (\\p{L}), mark (\\p{M}), punctuation (\\p{P}), whitespace/separator (\\p{Z}), other formatting (\\p{Cf}) and other characters above U+FFFF in Unicode (\\p{Cs}), and newline (\\s) characters. \\p{L} specifically includes the characters from other alphabets such as Cyrillic, Latin, Kanji, etc.
The ^ in the regex character set negates the match.

Example:

String str = "hello world _# 皆さん、こんにちは！　私はジョンと申します。🔥";
System.out.print(str.replaceAll("[^\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{Cf}\\p{Cs}\\s]",""));
// Output:
//   "hello world _# 皆さん、こんにちは！　私はジョンと申します。"

If you need more information, check out the Java documentation for regexes.

I'm not super into Java, so I won't try to write example code inline, but the way I would do this is to check what Unicode calls "the general category" of each character. There are a couple letter and punctuation categories.

You can use Character.getType to find the general category of a given character. You should probably retain those characters that fall in these general categories:

COMBINING_SPACING_MARK
CONNECTOR_PUNCTUATION
CURRENCY_SYMBOL
DASH_PUNCTUATION
DECIMAL_DIGIT_NUMBER
ENCLOSING_MARK
END_PUNCTUATION
FINAL_QUOTE_PUNCTUATION
FORMAT
INITIAL_QUOTE_PUNCTUATION
LETTER_NUMBER
LINE_SEPARATOR
LOWERCASE_LETTER
MATH_SYMBOL
MODIFIER_LETTER
MODIFIER_SYMBOL
NON_SPACING_MARK
OTHER_LETTER
OTHER_NUMBER
OTHER_PUNCTUATION
PARAGRAPH_SEPARATOR
SPACE_SEPARATOR
START_PUNCTUATION
TITLECASE_LETTER
UPPERCASE_LETTER

(All of the characters you listed as specifically wanting to remove have general category OTHER_SYMBOL, which I did not include in the above category whitelist.)

Based on Full Emoji List, v11.0 you have 1644 different Unicode code points to remove. For example ✅ is on this list as U+2705.

Having the full list of emojis you need to filter them out using code points. Iterating over single char or byte won't work as single code point can span multiple bytes. Because Java uses UTF-16 emojis will usually take two chars.

String input = "ab✅cd";
for (int i = 0; i < input.length();) {
  int cp = input.codePointAt(i);
  // filter out if matches
  i += Character.charCount(cp); 
}

Mapping from Unicode code point U+2705 to Java int is straightforward:

int viSign = 0x2705;

or since Java supports Unicode Strings:

int viSign = "✅".codePointAt(0);

ICU4J is your friend.

UCharacter.hasBinaryProperty(UProperty.EMOJI);

Remember to keep your version of icu4j up to date and note this will only filter out official Unicode emoji, not symbol characters. Combine with filtering out other character types as desired.

More information: http://icu-project.org/apiref/icu4j/com/ibm/icu/lang/UProperty.html#EMOJI

I gave some examples below, and thought that Latin is enough, but...

Is there a way to remove all these signs from the input string and keeping only the letters & punctuation in the different languages?

After editing, developed a new solution, using the Character.getType method, and that appears to be the best shot at this.

package zmarcos.emoji;

import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;

public class TestEmoji {

    public static void main(String[] args) {
        String[] arr = {"Remove ✅, 🔥, ✈ , ♛ and other such signs from Java string",
            "→ Cats and dogs",
            "I'm on 🔥",
            "Apples ⚛ ",
            "✅ Vi sign",
            "♛ I'm the king ♛ ",
            "Star me ★",
            "Star ⭐ once more",
            "早上好 ♛",
            "Καλημέρα ✂"};
        System.out.println("---only letters and spaces alike---\n");
        for (String input : arr) {
            int[] filtered = input.codePoints().filter((cp) -> Character.isLetter(cp) || Character.isWhitespace(cp)).toArray();
            String result = new String(filtered, 0, filtered.length);
            System.out.println(input);
            System.out.println(result);
        }

        System.out.println("\n---unicode blocks white---\n");
        Set<Character.UnicodeBlock> whiteList = new HashSet<>();
        whiteList.add(Character.UnicodeBlock.BASIC_LATIN);
        for (String input : arr) {
            int[] filtered = input.codePoints().filter((cp) -> whiteList.contains(Character.UnicodeBlock.of(cp))).toArray();
            String result = new String(filtered, 0, filtered.length);
            System.out.println(input);
            System.out.println(result);
        }

        System.out.println("\n---unicode blocks black---\n");
        Set<Character.UnicodeBlock> blackList = new HashSet<>();        
        blackList.add(Character.UnicodeBlock.EMOTICONS);
        blackList.add(Character.UnicodeBlock.MISCELLANEOUS_TECHNICAL);
        blackList.add(Character.UnicodeBlock.MISCELLANEOUS_SYMBOLS);
        blackList.add(Character.UnicodeBlock.MISCELLANEOUS_SYMBOLS_AND_ARROWS);
        blackList.add(Character.UnicodeBlock.MISCELLANEOUS_SYMBOLS_AND_PICTOGRAPHS);
        blackList.add(Character.UnicodeBlock.ALCHEMICAL_SYMBOLS);
        blackList.add(Character.UnicodeBlock.TRANSPORT_AND_MAP_SYMBOLS);
        blackList.add(Character.UnicodeBlock.GEOMETRIC_SHAPES);
        blackList.add(Character.UnicodeBlock.DINGBATS);
        for (String input : arr) {
            int[] filtered = input.codePoints().filter((cp) -> !blackList.contains(Character.UnicodeBlock.of(cp))).toArray();
            String result = new String(filtered, 0, filtered.length);
            System.out.println(input);
            System.out.println(result);
        }
        System.out.println("\n---category---\n");
        int[] category = {Character.COMBINING_SPACING_MARK, Character.COMBINING_SPACING_MARK, Character.CONNECTOR_PUNCTUATION, /*Character.CONTROL,*/ Character.CURRENCY_SYMBOL,
            Character.DASH_PUNCTUATION, Character.DECIMAL_DIGIT_NUMBER, Character.ENCLOSING_MARK, Character.END_PUNCTUATION, Character.FINAL_QUOTE_PUNCTUATION,
            /*Character.FORMAT,*/ Character.INITIAL_QUOTE_PUNCTUATION, Character.LETTER_NUMBER, Character.LINE_SEPARATOR, Character.LOWERCASE_LETTER,
            /*Character.MATH_SYMBOL,*/ Character.MODIFIER_LETTER, /*Character.MODIFIER_SYMBOL,*/ Character.NON_SPACING_MARK, Character.OTHER_LETTER, Character.OTHER_NUMBER,
            Character.OTHER_PUNCTUATION, /*Character.OTHER_SYMBOL,*/ Character.PARAGRAPH_SEPARATOR, /*Character.PRIVATE_USE,*/
            Character.SPACE_SEPARATOR, Character.START_PUNCTUATION, /*Character.SURROGATE,*/ Character.TITLECASE_LETTER, /*Character.UNASSIGNED,*/ Character.UPPERCASE_LETTER};
        Arrays.sort(category);
        for (String input : arr) {
            int[] filtered = input.codePoints().filter((cp) -> Arrays.binarySearch(category, Character.getType(cp)) >= 0).toArray();
            String result = new String(filtered, 0, filtered.length);
            System.out.println(input);
            System.out.println(result);
        }
    }

}

Output:

---only letters and spaces alike---

Remove ✅, 🔥, ✈ , ♛ and other such signs from Java string
Remove      and other such signs from Java string
→ Cats and dogs
 Cats and dogs
I'm on 🔥
Im on 
Apples ⚛ 
Apples  
✅ Vi sign
 Vi sign
♛ I'm the king ♛ 
 Im the king  
Star me ★
Star me 
Star ⭐ once more
Star  once more
早上好 ♛
早上好 
Καλημέρα ✂
Καλημέρα 

---unicode blocks white---

Remove ✅, 🔥, ✈ , ♛ and other such signs from Java string
Remove , ,  ,  and other such signs from Java string
→ Cats and dogs
 Cats and dogs
I'm on 🔥
I'm on 
Apples ⚛ 
Apples  
✅ Vi sign
 Vi sign
♛ I'm the king ♛ 
 I'm the king  
Star me ★
Star me 
Star ⭐ once more
Star  once more
早上好 ♛

Καλημέρα ✂


---unicode blocks black---

Remove ✅, 🔥, ✈ , ♛ and other such signs from Java string
Remove , ,  ,  and other such signs from Java string
→ Cats and dogs
→ Cats and dogs
I'm on 🔥
I'm on 
Apples ⚛ 
Apples  
✅ Vi sign
 Vi sign
♛ I'm the king ♛ 
 I'm the king  
Star me ★
Star me 
Star ⭐ once more
Star  once more
早上好 ♛
早上好 
Καλημέρα ✂
Καλημέρα 

---category---

Remove ✅, 🔥, ✈ , ♛ and other such signs from Java string
Remove , ,  ,  and other such signs from Java string
→ Cats and dogs
 Cats and dogs
I'm on 🔥
I'm on 
Apples ⚛ 
Apples  
✅ Vi sign
 Vi sign
♛ I'm the king ♛ 
 I'm the king  
Star me ★
Star me 
Star ⭐ once more
Star  once more
早上好 ♛
早上好 
Καλημέρα ✂
Καλημέρα

The code works by streaming the String to code-points. Then using lambdas to filter characters into a int array, then we convert the array to String.

The letters and spaces are using using the Character methods to filter, not good with punctuation. Failed attempt.

The unicode blocks white filter using the unicode blocks the programmer specifies as allowed. Failed attempt.

The unicode blocks black filter using the unicode blocks the programmer specifies as not allowed. Failed attempt.

The category filter using the static method Character.getType. The programmer can define in the category array what types are allowed. WORKS😨😱😰😲😀.

Try this project simple-emoji-4j

Compatible with Emoji 12.0 (2018.10.15)

Simple with:

EmojiUtils.removeEmoji(str)

Related questions
                            
                                Picking a random element from a set
                            
                                Regex Named Groups in Java
                            
                                com.jcraft.jsch.JSchException: UnknownHostKey
                            
                                Unfinished Stubbing Detected in Mockito
                            
                                Jackson - Deserialize using generic class
                            
                                Command-line Tool to find Java Heap Size and Memory Used (Linux)?
                            
                                How to get a resource id with a known resource name?
                            
                                Tomcat: How to find out running Tomcat version?
                            
                                How to install a specific JDK on Mac OS X?
                            
                                Difference between List, List<?>, List<T>, List<E>, and List<Object>
                            
                                Use of Java's Collections.singletonList()?
                            
                                Javadoc @see or {@link}?
                            
                                Unrecognized SSL message, plaintext connection? Exception
                            
                                What Java 8 Stream.collect equivalents are available in the standard Kotlin library?
                            
                                String.format() to format double in Java
                            
                                How can I parse a local JSON file from assets folder into a ListView?
                            
                                Bytes of a string in Java
                            
                                Why can't strings be mutable in Java and .NET?
                            
                                Rename a file using Java
                            
                                How can I efficiently parse HTML with Java?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Remove ✅, 🔥, ✈ , ♛ and other such emojis/images/signs from Java strings

Tags:

java

string

emoji

People also ask

Recent Activity

Donate For Us