I've been working with a the PyYAML
parser for a few months now to convert file types as part of a data pipeline. I've found the parser to be quite idiosyncratic at times and it seems that today I've stumbled on another strange behavior. The file I'm currently converting contains the following section:
off:
yes: "Flavor text for yes"
no: "Flavor text for no"
I keep a list of the current nesting in the dictionary so that I can construct a flat document, but save the nesting to convert back to YAML later on. I got a TypeError
saying I was trying to concatenate a str
and bool
type together. I investigated and found that PyYaml
is actually taking my section of text above and converting it to the following:
with open(filename, "r") as f:
data = yaml.load(f.read())
print data
>> {False: {True: "Flavor text for yes", False: "Flavor text for no}}
I did a quick check and found that PyYAML
was doing this for yes
, no
, true
, false
, on
, off
. It only does this conversion if the keys are unquoted. Quoted values and keys will be passed fine. Looking for solutions, I found this behavior documented here.
Although it might be helpful to others to know that quoting the keys will stop PyYAML
from doing this, I don't have this option as I am not the author of these files and have written my code to touch the data as little as possible.
Is there a workaround for this issue or a way to override the default conversion behavior in PyYAML
?
PyYAML is YAML 1.1 conformant for parsing and emitting, and for YAML 1.1 this is at least partly documented behavior, so no idiosyncrasy at all, but conscious design.
In YAML 1.2 (which in 2009 superseded the 1.1 specification from 2005) this usage of Off/On/Yes/No
was dropped, among other changes.
In ruamel.yaml
(disclaimer: I am the author of that package), the round_trip_loader
is a safe_loader that defaults to YAML 1.2 behaviour:
import ruamel.yaml as yaml
yaml_str = """\
off:
yes: "Flavor text for yes" # quotes around value dropped
no: "Flavor text for no"
"""
data = yaml.round_trip_load(yaml_str)
assert 'off' in data
print(yaml.round_trip_dump(data, indent=4))
Which gives:
off:
yes: Flavor text for yes # quotes around value dropped
no: Flavor text for no
If your output needs to be version 1.1 compatible then you can dump with
an explicit version=(1, 1)
.
Since the quotes around the nested mapping's scalar values are unnecessary they are not emitted on writing out.
If you need to do this with PyYAML, rewrite the (global) rules it uses for boolean recognition:
import yaml
from yaml.resolver import Resolver
import re
yaml_str = """\
off:
yes: "Flavor text for yes" # quotes around value dropped
no: "Flavor text for no"
"""
# remove resolver entries for On/Off/Yes/No
for ch in "OoYyNn":
if len(Resolver.yaml_implicit_resolvers[ch]) == 1:
del Resolver.yaml_implicit_resolvers[ch]
else:
Resolver.yaml_implicit_resolvers[ch] = [x for x in
Resolver.yaml_implicit_resolvers[ch] if x[0] != 'tag:yaml.org,2002:bool']
data = yaml.load(yaml_str)
print(data)
assert 'off' in data
print(yaml.dump(data))
Which gives:
{'off': {'yes': 'Flavor text for yes', 'no': 'Flavor text for no'}}
off: {no: Flavor text for no, yes: Flavor text for yes}
This works because PyYAML keeps a global dict (Resolver.yaml_implicit_resolvers
) which maps first letters to a list of (tag, re.match_pattern) values. For for o
, O
, y
and Y
there is only one such pattern (and it can be deleted), but for n
/N
you can also match null
/Null
, so you have to delete the right pattern.
After that removal yes
, no
, on
, Off
are no longer recognised as bool, but True
and False
still are.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With