YAML 1.2 is (with one minor caveat regarding duplicate keys) a superset of JSON, so any valid JSON file is also a valid YAML file. However, the YAML 1.1 specification (which has the most library support) doesn't mention JSON. Most valid JSON files are valid YAML 1.1 files, but I found at least one exception by experimenting with PyYaml and Python's standard JSON library:
12345e999
is interpreted as a string by PyYAML and IEEE infinity by Python's JSON library.Does anyone have a complete list of differences, determined more robustly than by testing edge cases in a particular implementation? (That is, from a comparison of the specifications?) For example, I want to generate JSON strings that will be interpreted the same way by a JSON parser and a YAML 1.1 parser: what constraints must I place on my strings?
YAML 1.2 is (with one minor caveat regarding duplicate keys) a superset of JSON, so any valid JSON file is also a valid YAML file. However, the YAML 1.1 specification (which has the most library support) doesn't mention JSON.
YAML supports comments where JSON does not. We can comment anywhere in the document with a simple # character. This has proven advantageous when writing configuration files where one developer can easily describe the configuration using the comments.
As you noticed, one thing is what the specifications say the other what commonly available parsers (both YAML and JSON) process. You should therefore take several aspects into account and use the least common denominator to not be able to load your JSON with a YAML parser.
On the JSON side there are multiple standards and best practises. Originally a JSON text would have to have an object or array at the topmost level. This is still so according to the fail1.json
files available on the json.org site:
"A JSON payload should be an object or array, not a string."
According to RFC7159 any value can be at the top level (apart from using a string, this leads to rather boring JSON files):
A JSON text is a serialized value. Note that certain previous specifications of JSON constrained a JSON text to be an object or an array. Implementations that generate only objects or arrays where a JSON text is called for will be interoperable in the sense that all implementations will accept these as conforming JSON texts.
Because of the problems with JSON hijacking *by redefining the array handing in older browsers) there have been implementations that only accept an object at the top level (i.e. the first character of the file has to be {
.
On the YAML side there are fewer competing standards than with JSON, but things get muddled by the persistent usage of YAML 1.1, and is not helped by the fact that if you google for "yaml current spec" the first hit is yaml.org/spec/current.html and that is actually an old working-draft for YAML 1.1
Apart from the UTF-32 support the other answer mentioned, which is largely a non-issue in a world using UTF-8 almost exclusively, there are a few things to take into account, especially if you want PyYAML to to be able to parse your JSON (PyYAML still implements most of YAML 1.1 only, close to eight years after the YAML 1.2 spec release):
numbers in JSON don't need a dot in the mantissa, even if such a number has an exponent:
but the Floating-Point Language-Independent Type for YAML™ Version 1.1 does require that dot:
|[-]?0\.([0-9]*[1-9])?e[-+](0|[1-9][0-9]+) (scientific)
^--- no ? or * associated with this dot
(in the YAML 1.2 spec this regex has changed to:
-? [1-9] ( \. [0-9]* [1-9] )? ( e [-+] [1-9] [0-9]* )?.
allowing the dot to disappear even if there is an e
(and no E
) and exponent.
This is the cause for your 12345e999
being handled differently by JSON (overflow) and PyYAML (string). In YAML 1.1 this can only be interpreted as a string and hence doesn't need quotes and can be plain scalar.
In YAML 1.1 there are escape sequences, but this is not a superset from what JSON supports. The forward slash (/
) can be escaped in JSON, but not in YAML 1.1 (it can in YAML 1.2, rule 53)
In JSON as well as in YAML 1.1 you can use \uNNNN
to indicate a 16 bit unicode code point. Although the YAML 1.1 spec (and YAML 1.2) mentions surrogate pairs in conjunction with using UTF-16, nothing is mentioned about such pairs as escaped sequences ("\uD834\uDD1E"
). This string sequence is explicitly mentioned in RFC 7159 as representing the G clef character (U+1D11E). I don't know of any YAML parser that support this, PyYAML throws a:
yaml.reader.ReaderError: unacceptable character #xd834: special characters are not allowed
So as long as you write your JSON
\/
escape sequence\uNNNN
characters between \uD7FF
and \uE000
(exclusive), nor \uFFFE
, nor \uFFFF
you should be fine for both JSON and YAML (1.1) parsers.
¹ In ruamel.yaml a YAML 1.2 parser of which I am the author, the \/
and scientific numbers without dot are handled correctly: your 12345e999
loads as type float
and prints as inf
.
See here (specifically footnote 25). It says:
The incompatibilities were as follows: JSON allows extended character sets like UTF-32 and had incompatible unicode character escape syntax relative to YAML; YAML required a space after separators like comma, equals, and colon while JSON does not. Some non-standard implementations of JSON extend the grammar to include Javascript's /*...*/ comments. Handling such edge cases may require light pre-processing of the JSON before parsing as in-line YAML
See also https://metacpan.org/pod/JSON::XS#JSON-and-YAML
Related
What is the difference between YAML and JSON? When to prefer one over the other
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With