How should escaped unicode be handled by json parsers and encoders?

Tags:

The json spec allows for escaped unicode in json strings (of the form \uXXXX). It specifically mentions a restricted codepoint (a noncharacter) as a valid escaped codepoint. Doesn't this imply parsers should generate illegal unicode from strings containing noncharacters and restricted codepoints?

An example:

{ "key": "\uFDD0" }

decoding this either requires your parser makes no attempt to interpret the escaped codepoint or that it generates an invalid unicode string. does it not?

349

asked Oct 04 '09 04:10

ArgumentError

2 Answers

When you decode, it seems that this would be an appropriate use for the unicode replacement character, U+FFFD.

From the Unicode Character Database:

used to replace an incoming character whose value is unknown or unrepresentable in Unicode
compare the use of U+001A as a control character to indicate the substitute function

153

answered Nov 15 '22 23:11

Adam Goode

What do you mean by “restricted codepoint”? What spec are you looking at that uses that language? (I can't find any such.)

If you are talking about the surrogates then yes: JavaScript knows almost nothing(*) about surrogates and treats all UTF-16 codepoints in any sequence as valid. JSON, being limited to what JavaScript supports, does the same.

*: the only part of JS I can think of that does anything special with surrogates is the encodeURIComponent function, as it uses UTF-8 encoding, in which an attempt to encode an invalid surrogate sequence cannot work. If you try to:

encodeURIComponent('\ud834\udd1e'.substring(0, 1))

you will get an exception.

(Gah! SO seems not to allow characters from outside the Basic Multilingual Plane to be posted directly. Tsk.)

answered Nov 15 '22 21:11

bobince

Related questions
                            
                                Laravel: JSON and pivot table
                            
                                Get the API value from the JSON returned in a HttpResponseMessage
                            
                                Json serialized data having backslashes
                            
                                Get values through post method from URL
                            
                                Modify object properties conditionally with jq
                            
                                Is empty body correct if Content-Type is application/json?
                            
                                Java 8 - working with JsonObject (javax.json) - how to determine if a node has child nodes?
                            
                                How to convert RDF to pretty nested JSON using java rdf4j
                            
                                Swift Decodable: how to transform one of values during decoding?
                            
                                Reading JSON files into Spark Dataset and adding columns from a separate Map
                            
                                Express.js Server Side Rendering - Request '/json/version/
                            
                                Fetch API cannot load file:///C:/Users/woshi/Desktop/P5/p5/JSON/birds.json. URL scheme must be "http" or "https" for CORS request [duplicate]
                            
                                Decode or unescape \u00f0\u009f\u0091\u008d to 👍
                            
                                MySQL forming JSON_OBJECT, specify absence on NULL - is it possible?
                            
                                DT Editing in Shiny application with client-side processing (server = F) throws JSON Error
                            
                                jq: recursively merge objects and concatenate arrays
                            
                                Is there a simple way to manually serialize/deserialize child objects in a custom converter in System.Text.Json?
                            
                                How to extract content from <script> using Beautiful Soup
                            
                                How to force System.Text.Json serializer throw exception when property is missing?
                            
                                ADO.NET Data Services - format parameter - xml / json

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How should escaped unicode be handled by json parsers and encoders?

Tags:

json

unicode

ArgumentError

People also ask

2 Answers

Adam Goode

bobince

Recent Activity

Donate For Us