Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JSON and escaping characters

Tags:

json

unicode

People also ask

Can JSON have escape characters?

In JSON the only characters you must escape are \, ", and control codes. Thus in order to escape your structure, you'll need a JSON specific function.

What should be escaped in JSON string?

JSON is pretty liberal: The only characters you must escape are \ , " , and control codes (anything less than U+0020).

How do I remove an escape character in JSON?

replaceAll("\\",""); But do note that those escape characters in no way make the JSON invalid or otherwise semantically different -- the '/' character can be optionally escaped with '\' in JSON.


This is not a bug in either implementation. There is no requirement to escape U+00B0. To quote the RFC:

2.5. Strings

The representation of strings is similar to conventions used in the C family of programming languages. A string begins and ends with quotation marks. All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).

Any character may be escaped.

Escaping everything inflates the size of the data (all code points can be represented in four or fewer bytes in all Unicode transformation formats; whereas encoding them all makes them six or twelve bytes).

It is more likely that you have a text transcoding bug somewhere in your code and escaping everything in the ASCII subset masks the problem. It is a requirement of the JSON spec that all data use a Unicode encoding.


hmm, well here's a workaround anyway:

function JSON_stringify(s, emit_unicode)
{
   var json = JSON.stringify(s);
   return emit_unicode ? json : json.replace(/[\u007f-\uffff]/g,
      function(c) { 
        return '\\u'+('0000'+c.charCodeAt(0).toString(16)).slice(-4);
      }
   );
}

test case:

js>s='15\u00f8C 3\u0111';
15°C 3◄
js>JSON_stringify(s, true)
"15°C 3◄"
js>JSON_stringify(s, false)
"15\u00f8C 3\u0111"

This is SUPER late and probably not relevant anymore, but if anyone stumbles upon this answer, I believe I know the cause.

So the JSON encoded string is perfectly valid with the degree symbol in it, as the other answer mentions. The problem is most likely in the character encoding that you are reading/writing with. Depending on how you are using Gson, you are probably passing it a java.io.Reader instance. Any time you are creating a Reader from an InputStream, you need to specify the character encoding, or java.nio.charset.Charset instance (it's usually best to use java.nio.charset.StandardCharsets.UTF_8). If you don't specify a Charset, Java will use your platform default encoding, which on Windows is usually CP-1252.