Why does JSON encode UTF-16 surrogate pairs instead of Unicode code points directly?

Tags:

To escape a code point that is not in the Basic Multilingual Plane, the character is represented as a twelve-character sequence, encoding the UTF-16 surrogate pair. So for example, a string containing only the G clef character (U+1D11E) may be represented as "\uD834\uDD1E".

ECMA-404: The JSON Data Interchange Format

I believe that there is no need to encode this character at all, so it could be represented directly as "𝄞". However, should one wish to encode it, it must, per spec, be encoded as "\uD834\uDD1E", not (as would seem reasonable) as "\u1d11e". Why is this?

570

asked Jul 19 '16 15:07

TRiG

1 Answers

One of the key architectural features of JSON is that JSON-encoded objects are valid Javascript literals that can be evaluated using the eval function, for example. Unfortunately, older Javascript implementations only support 16-bit Unicode escape sequences with four hex characters in string literals, so there's no other way than to use UTF-16 surrogates in escape sequences for code points above 0xFFFF in a portable way. (The \u{...} syntax that allows arbitrary code points was only introduced in ECMAScript 6.)

But as you mentioned, there's no need to use escape sequences if your application supports Unicode JSON text. Simply encode the characters directly in the respective Unicode format.

answered Oct 21 '22 15:10

nwellnhof

Related questions
                            
                                Why can't NULL be converted to JSON's null in postgreSQL?
                            
                                Flask JSONEncoder set ensure_ascii to False
                            
                                JSON Schema - multiple types
                            
                                Visual Studio 2015 add JSON-File
                            
                                Why use Camel Case for JS and Snake Case for your DB?
                            
                                Jquery Datatable Colspan on Some Rows
                            
                                Deserialize to custom list [duplicate]
                            
                                Escaping special characters for JSON output
                            
                                Jackson JSON how to set http connection and read timeout
                            
                                Looping Json & Display in React Native
                            
                                How to create table to store json object data in PostgreSQL database?
                            
                                Do we have an upper limit for number of keys in a JSON array in javascript?
                            
                                How to group a javascript object array by multiple of its properties?
                            
                                Performance of gzipped json vs efficient binary serialization
                            
                                Will trailing commas break JSON?
                            
                                access HTTP PUT data in Symfony2
                            
                                How to parse JSON response in a built step in jenkins
                            
                                How do I programmatically create JSON in XQuery in MarkLogic?
                            
                                In Excel VBA on Windows, how to get stringified JSON respresentation instead of "[object Object]" for parsed JSON variables?
                            
                                Rails & docker - can't install json gem

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does JSON encode UTF-16 surrogate pairs instead of Unicode code points directly?

Tags:

json

unicode

specifications

ecma

TRiG

People also ask

1 Answers

nwellnhof

Recent Activity

Donate For Us