I'm a little unsure exactly where to point the finger (other than at myself of course)
JSON is a subset of YAML 1.2 http://www.yaml.org/spec/1.2/spec.html "every JSON file is also a valid YAML file"
JSON can have tabs as 'insignificant whitespace' - including tabs http://www.ietf.org/rfc/rfc4627.txt "Insignificant whitespace is allowed ..."
YAML does not allow tabs for indentation http://www.yaml.org/spec/1.2/spec.html "tab characters must not be used in indentation"
So using my YAML parser to process the below JSON
{
\t"result" : "success",
}
NOTE: the \t is just to visualize, the input contains a real tab character.
Hits an error 'not allowed to use tab for indenting' <- which seems correct.
But then how does the "every JSON file is also a valid YAML file" rule hold; as my file is valid JSON?
As the tab character is meaningless should I just run a pre-processing step to strip out all tabs? As the only whitespace that is allowed in strings is 'space'- it should be safe to just strip out all tabs in the file.
Hits an error 'not allowed to use tab for indenting' <- which seems correct.
It is not.
This is the relevant production in the Spec:
[140] c-flow-mapping(n,c) ::= “{” s-separate(n,c)?
ns-s-flow-map-entries(n,in-flow(c))? “}”
s-separate(n,c) resolves to s-separate-lines(n) here (because we are not inside block-key or flow-key). Skipping some steps, we reach s-separate-in-line which allows tab characters.
The bottom line is that this tab character in your JSON is not indentation. Indentation is only relevant in block style (i.e. not using [ or { for sequences and mappings respectively). In Flow style, whitespace is only for separation.
Edit: Removed example link because it was somewhat misleading.
Edit 2: To answer your second question: No, do not strip tabs. They may be content inside scalars! See this example where a tabular actually determines the indentation of a block scalar.
The JSON compatibility has only been added in version 1.2 of the YAML specification. The implementation of such compatibility on top of a parser originally designed for YAML 1.1 is not trivial.
The tab character has no fixed representation in spaces and when editing depends on the settings (or default) of your editor. In practise it means that you should not use tab characters at all in block style mode, and most parsers don't allow them in flow-style mode either.
So this should be accepted by your parser, as it is done by ruamel.yaml>=0.17.24 (when using pure Python), but if it doesn't you could filter it out, but only at the beginning of lines and if you know TAB is not used in literal- or flow-style scalars.
If the JSON automatically generated, adapt the generator to use space(s).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With