Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JSON specifies "any UNICODE character"?

Maybe this is just my unfamiliarity with unicode, so please correct me if I'm mistaken.

Looking at http://json.org/, the spec says that a string can include "any UNICODE character", but this confuses me.

  • JSON is a communication format correct? At the core of it, everything must translate down to bytes.
  • In contrast, UNICODE is a logical format and must be encoded to be able to transmit it, right?

So what did they mean there?

like image 320
bukzor Avatar asked May 03 '10 16:05

bukzor


2 Answers

From the RFC:

3.  Encoding

   JSON text SHALL be encoded in Unicode.  The default encoding is
   UTF-8.

   Since the first two characters of a JSON text will always be ASCII
   characters [RFC0020], it is possible to determine whether an octet
   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
   at the pattern of nulls in the first four octets.

           00 00 00 xx  UTF-32BE
           00 xx 00 xx  UTF-16BE
           xx 00 00 00  UTF-32LE
           xx 00 xx 00  UTF-16LE
           xx xx xx xx  UTF-8
like image 173
cobbal Avatar answered Oct 26 '22 19:10

cobbal


JSON is a serialization format which can include UNICODE characters. The byte representation of this unicode string is usually sent over the wire, normally through the HTTP protocol which uses HTTP headers to specify the encoding to the client which is UTF-8.

like image 29
Darin Dimitrov Avatar answered Oct 26 '22 19:10

Darin Dimitrov