The JSON specification states that control characters that must be escaped are only with codes from U+0000 to U+001F:
7. Strings
The representation of strings is similar to conventions used in the C
family of programming languages. A string begins and ends with
quotation marks. All Unicode characters may be placed within the
quotation marks, except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).
Main idea of escaping is to don't damage output when printing JSON document or message on terminal or paper.
But there other control characters like [DEL] from C0 and other control characters from C1 set (U+0080 through U+009F). Shouldn't be they also escaped in JSON strings?
From the JSON specification:
8. String and Character Issues
8.1. Character Encoding
JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.
In UTF-8, all codepoints above 127 are encoded in multiple bytes. About half of those bytes are in the C1 control character range. So in order to avoid having those bytes in a UTF-8 encoded JSON string, all of those code points would need to be escaped. This effectively eliminates the use of UTF-8 and the JSON string might as well be encoded in ASCII. As ASCII is a subset of UTF-8 this is not disallowed by the standard. So if you are concerned with putting C1 control characters in the byte stream just escape them, but requiring every JSON representation to use ASCII would be wildly inefficient in anything but an english environment.
UTF-16 and UTF-32 could not possibly be parsed by something that uses the C1 (or even C0) control characters so the point is rather moot for those encodings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With