Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove all special characters from a string not including non-latin characters

I want to remove all the special characters from a string except numbers and normal a-z characters.

I am doing it like this:

text = text.replaceAll("[^a-zA-Z0-9 ]+", "");

The problem with this way is that it will also remove all non-latin characters like è, é, ê, ë and many others.

By non-special characters (the ones I want to keep) I mean all the numbers and all the alphabetical characters for all the languages or at least as many as possible.

How do I only remove the special characters?

like image 526
Aki K Avatar asked Oct 01 '22 00:10

Aki K


2 Answers

You can try \p{L} for all letters and \p{N} for all numbers:

text = text.replaceAll("[^\\p{L}\\p{N} ]+", "");
like image 149
Sabuj Hassan Avatar answered Oct 12 '22 19:10

Sabuj Hassan


I know you said regex, but if guava is an option:

CharMatcher.JAVA_LETTER_OR_DIGIT.retainFrom("èêAAAGRt123")
like image 31
Eugene Avatar answered Oct 12 '22 18:10

Eugene