What is the best and most efficient way to filter out all UTF-8 punctuation characters and symbols like ✀ ✁ ✂ ✃ ✄ ✅ ✆ ✇ ✈ etc from a String. Simply filtering out all characters that are not in a-z, A-Z and 0-9 is not an option, because I want to keep letters from other languages (ą, ę, ó etc.) Thanks in advance.
You could use \p{L} to match all unicode letters. Example:
public static void main(String[] args) throws IOException {
    String[] test = {"asdEWR1", "ąęóöòæûùÜ", "sd,", "✀","✁","✂","✃","✄","✅","✆","✇","✈"};
    for (String s : test)
        System.out.println(s + " => " + s.replaceAll("[^\\p{L}^\\d]", ""));
}
outputs:
asdEWR1 => asdEWR1
ąęóöòæûùÜ => ąęóöòæûùÜ
sd, => sd
✀ => 
✁ => 
✂ => 
✃ => 
✄ => 
✅ => 
✆ => 
✇ => 
✈ => 
                        Try the combinations of unicode binary classifications:
String fixed = value.replaceAll("[^\\p{IsAlphabetic}\\p{IsDigit}]", "");
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With