Apparently Java's Regex flavor counts Umlauts and other special characters as non-"word characters" when I use Regex.
"TESTÜTEST".replaceAll( "\\W", "" )
returns "TESTTEST" for me. What I want is for only all truly non-"word characters" to be removed. Any way to do this without having something along the lines of
"[^A-Za-z0-9äöüÄÖÜßéèáàúùóò]"
only to realize I forgot ô?
A common solution to remove all non-alphanumeric characters from a String is with regular expressions. The idea is to use the regular expression [^A-Za-z0-9] to retain only alphanumeric characters in the string. You can also use [^\w] regular expression, which is equivalent to [^a-zA-Z_0-9] .
string = string. replaceAll("[^\\p{ASCII}]", "");
Using String.replace() method to remove all occurrences of each character from the string within a loop.
Use [^\p{L}\p{Nd}]+
- this matches all (Unicode) characters that are neither letters nor (decimal) digits.
In Java:
String resultString = subjectString.replaceAll("[^\\p{L}\\p{Nd}]+", "");
Edit:
I changed \p{N}
to \p{Nd}
because the former also matches some number symbols like ¼
; the latter doesn't. See it on regex101.com.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With