How do I remove strange and unwanted Unicode characters (such as a black diamond with question mark) from a String?
Updated:
Please tell me the Unicode character string or regex that correspond to "a black diamond with question mark in it".
You can use a regular expression and replaceAll() method of java. lang. String class to remove all special characters from String.
Strings - Special Characters The solution to avoid this problem, is to use the backslash escape character.
A black diamond with a question mark is not a unicode character -- it's a placeholder for a character that your font cannot display. If there is a glyph that exists in the string that is not in the font you're using to display that string, you will see the placeholder. This is defined as U+FFFD: �. Its appearance varies depending on the font you're using.
You can use java.text.normalizer
to remove Unicode characters that are not in the "normal" ASCII character set.
You can use a String.replaceAll("[my-list-of-strange-and-unwanted-chars]","")
There is no Character.isStrangeAndUnWanted()
, you have to define what you want.
If you want to remove control characters you can do
String str = "\u0000\u001f hi \n";
str = str.replaceAll("[\u0000-\u001f]", "");
prints hi
(keeps the space).
EDIT If you want to know the unicode of any 16-bit character you can do
int num = string.charAt(n);
System.out.println(num);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With