I want to detect and remove high-ASCII characters like ®, ©, ™ from a String in Java. Is there any open-source library that can do this?
If you need to remove all non-US-ASCII (i.e. outside 0x0-0x7F) characters, you can do something like this:
s = s.replaceAll("[^\\x00-\\x7f]", "");
If you need to filter many strings, it would be better to use a precompiled pattern:
private static final Pattern nonASCII = Pattern.compile("[^\\x00-\\x7f]");
...
s = nonASCII.matcher(s).replaceAll();
And if it's really performance-critical, perhaps Alex Nikolaenkov's suggestion would be better.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With