Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java - removing strange characters from a String

Tags:

java

string

How do I remove strange and unwanted Unicode characters (such as a black diamond with question mark) from a String?

Updated:

Please tell me the Unicode character string or regex that correspond to "a black diamond with question mark in it".

like image 482
user224270 Avatar asked Mar 28 '11 17:03

user224270


People also ask

How do you remove unwanted characters from a string in Java?

You can use a regular expression and replaceAll() method of java. lang. String class to remove all special characters from String.

How do you skip special characters in Java?

Strings - Special Characters The solution to avoid this problem, is to use the backslash escape character.


2 Answers

A black diamond with a question mark is not a unicode character -- it's a placeholder for a character that your font cannot display. If there is a glyph that exists in the string that is not in the font you're using to display that string, you will see the placeholder. This is defined as U+FFFD: �. Its appearance varies depending on the font you're using.

You can use java.text.normalizer to remove Unicode characters that are not in the "normal" ASCII character set.

like image 191
asthasr Avatar answered Oct 18 '22 08:10

asthasr


You can use a String.replaceAll("[my-list-of-strange-and-unwanted-chars]","")

There is no Character.isStrangeAndUnWanted(), you have to define what you want.

If you want to remove control characters you can do

String str = "\u0000\u001f hi \n";
str = str.replaceAll("[\u0000-\u001f]", "");

prints hi (keeps the space).

EDIT If you want to know the unicode of any 16-bit character you can do

int num = string.charAt(n);
System.out.println(num);
like image 21
Peter Lawrey Avatar answered Oct 18 '22 09:10

Peter Lawrey