Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode Characters appearing as Question Marks in Java JSON Parsing

Tags:

java

json

unicode

I have been searching about this for the past few days but I don't think I am able to find a correct pointer. Please merge it with the appropriate question if found as duplicate.

I am pretty new to working with JSON and as part of one of my projects I need to decode a JSON file and do further processing on it. However when I tried decoding using the Json-simple library, I get some weird question marks in the parsed object instead of the actual characters. A sample code is shown below:

String str = "{\"alias\": [\"Evr\u00f3pa\", \"\u05d0\u05d9\u05e8\u05d5\u05e4\"]}";
JSONParser parser = new JSONParser(); 
JSONObject jsonObject = (JSONObject)parser.parse(str);

System.out.println(jsonObject) gives {"alias":["Evrópa","?????"]}

I tried using Json-lib too with the same result.

Thanks for the help.

like image 707
Sri Gandhi Avatar asked Aug 08 '12 15:08

Sri Gandhi


1 Answers

The problem isn't with your JSON, it's with your System.out.println(). Those characters can't be represented in the character encoding either of your terminal (or your IDE, if that is where you ran it) or of the encoding being used by System.out in your environment.

Files can not contain Unicode characters. Files are streams of bytes, but Unicode characters are multiple bytes (usually two) in size. This is where character encodings become relevant. Unicode characters must be converted to a sequence of bytes to write them to a file (including System.out). One of the most commonly used encodings for Unicode characters is UTF-8. The trick for software programmers is to always use the correct character encoding when converting between bytes and characters. Lacking the correct encoding in a single place, for example in a debug println() call, will give erroneous and misleading output.

like image 119
dsh Avatar answered Oct 02 '22 16:10

dsh