Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove high-ASCII characters from string like ®, ©, ™ in Java

Tags:

java

string

I want to detect and remove high-ASCII characters like ®, ©, ™ from a String in Java. Is there any open-source library that can do this?

like image 400
RandomQuestion Avatar asked Feb 15 '11 19:02

RandomQuestion


1 Answers

If you need to remove all non-US-ASCII (i.e. outside 0x0-0x7F) characters, you can do something like this:

s = s.replaceAll("[^\\x00-\\x7f]", "");

If you need to filter many strings, it would be better to use a precompiled pattern:

private static final Pattern nonASCII = Pattern.compile("[^\\x00-\\x7f]");
...
s = nonASCII.matcher(s).replaceAll();

And if it's really performance-critical, perhaps Alex Nikolaenkov's suggestion would be better.

like image 140
axtavt Avatar answered Sep 18 '22 11:09

axtavt