I want to create a YAML filter that reads a YAML file, processes it and dumps it afterwards.
It must resolve any aliases (that works already nicely out of the box):
>>> yaml.dump(yaml.load("""
Foo: &bar
name: bar
Foo2:
<<: *bar
"""))
'Foo: {name: bar}\nFoo2: {name: bar}\n'
But it shall also preserve any kind of !CustomTag: foo
expression, like:
>>> yaml.dump(yaml.load("Name: !Foo bar "))
yaml.constructor.ConstructorError: could not determine a constructor for the tag '!Foo' in "<unicode string>", line 1, column 7:
Name: !Foo bar
^
I read pyYAML Errors on "!" in a string and this is close to what I need, except that it parses and outputs the custom tag as quoted string, hence it isn't a tag anymore:
>>> def default_ctor(loader, tag_suffix, node):
... return tag_suffix + ' ' + node.value
>>> yaml.add_multi_constructor('', default_ctor)
>>> yaml.dump(yaml.load("Name: !Foo bar "), default_flow_style=False)
"Name: '!Foo bar'\n"
I guess there is not much missing, but what? How can I load a file that contains any tags and dump them afterwards?
Since default_ctor()
returns a string (which is just a concatenation of the tag and the scalar), that is what is being dumped. And because the tag starts with !
dumping that string to a scalar will get you quotes.
If you want to generically preserve the tag and value you need to store those in a special type (and not a "normal" Python string) and provide a representer (i.e. dumping routine) for that type:
import sys
import yaml
yaml_str = """\
Name: !Foo bar
Alt: !Bar foo
"""
class GenericScalar:
def __init__(self, value, tag, style=None):
self._value = value
self._tag = tag
self._style = style
@staticmethod
def to_yaml(dumper, data):
# data is a GenericScalar
return dumper.represent_scalar(data._tag, data._value, style=data._style)
def default_constructor(loader, tag_suffix, node):
if isinstance(node, yaml.ScalarNode):
return GenericScalar(node.value, tag_suffix, style=node.style)
else:
raise NotImplementedError('Node: ' + str(type(node)))
yaml.add_multi_constructor('', default_constructor, Loader=yaml.SafeLoader)
yaml.add_representer(GenericScalar, GenericScalar.to_yaml, Dumper=yaml.SafeDumper)
data = yaml.safe_load(yaml_str)
yaml.safe_dump(data, sys.stdout, default_flow_style=False, allow_unicode=True)
This gives:
Alt: !Bar 'foo'
Name: !Foo 'bar'
Notes:
load()
. Don't use it, it is not necessary (as my code shows). What makes it worse is that there is no feedback from PyYAML that there is any danger.default_constructor
with some elif isinstance(node, yaml.MappingNode)
and elif isinstance(node, yaml.SequenceNode)
. I would make those create different types (that behave like a dict resp. list), and if you go that route you should be aware that constructing those will need to happen in a two-step process (yield
the constructed object, then get the sub-node values and fill the object), otherwise you cannot use self-referential structures (i.e. aliases within the node).!CustomTag:
that ends in a colon, but I find it not so human friendly to read !CustomTag: foo
, as that looks very much like a key-value pair in a block style mapping.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With