Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In the JSON spec, what does "Since the first two characters of a JSON text will always be ASCII characters" mean?

Tags:

json

RFC 4627 on Json reads:

  1. Encoding

    JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

    Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets.

What does it mean "Since the first two characters of a JSON text will always be ASCII characters [RFC0020]"? I've looked at RFC0020 but couldn't find anything about it. JSON could be {" or { " (ie whitespace before the quote.

like image 697
dan gibson Avatar asked Nov 20 '10 10:11

dan gibson


People also ask

Is JSON an Ascii?

Since any JSON can represent unicode characters in escaped sequence \uXXXX , JSON can always be encoded in ASCII.

How do I pass a Unicode character in JSON?

All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F). Any character may be escaped.

What is the default encoding for JSON?

JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

Can JSON contain Unicode?

The JSON specification states that JSON strings can contain unicode characters in the form of: "here comes a unicode character: \u05d9 !"


1 Answers

It means that since JSON will always start with ASCII characters (non-ASCII is only permitted in strings, which cannot be the root object), it is possible to determine from the start of the stream/file what encoding it is in.

UTF-16 and UTF-32 should have a BOM that appears at the start of the stream and by finding out what it is, you can determine the exact encoding. This is possible as one can determine if the first characters are JSON or not.

I assume the spec specifically mentions this as for many other text streams/files, this is not always possible (as most text files can start with any two characters and the two first bytes of the actual file are not known in advance).

like image 119
Oded Avatar answered Sep 24 '22 00:09

Oded