Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can someone clarify Gson's unicode encoding?

Tags:

java

unicode

gson

In the following minimalistic example:

import com.google.gson.Gson; import com.google.gson.GsonBuilder;  public class GsonStuff {      public static void main(String[] args) {         GsonBuilder builder = new GsonBuilder();         Gson gson = builder.create();         System.out.println(gson.toJson("Apostrophe: '"));         //Outputs: "Apostrophe: \u0027"     }    } 

The apostrophe gets replaced by it's unicode representation in the printout. However, the String returned from the toJson method literally has the characters '\', 'u', '0', '0', '2', '7'.

Decoding it with json actually works and gives the string "Apostrophe: '" as opposed to "Apostrophe: \u0027". How should I decode it to get the same result?

And an additional question, why doesn't a random unicode character such as ش get encoded similarly?

like image 362
Miquel Avatar asked Jul 06 '12 11:07

Miquel


1 Answers

By default, gson Unicode escapes certain characters, of which ' is one. (See HTML_SAFE_REPLACEMENT_CHARS in JsonWriter for the complete list.)

To disable this, do

builder.disableHtmlEscaping(); 
like image 183
Gustav Barkefors Avatar answered Sep 20 '22 10:09

Gustav Barkefors