Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyYAML: load and dump yaml file and preserve tags ( !CustomTag )

I want to create a YAML filter that reads a YAML file, processes it and dumps it afterwards.

It must resolve any aliases (that works already nicely out of the box):

>>> yaml.dump(yaml.load("""
Foo: &bar
  name: bar
Foo2:
  <<: *bar
"""))

'Foo: {name: bar}\nFoo2: {name: bar}\n'

But it shall also preserve any kind of !CustomTag: foo expression, like:

>>> yaml.dump(yaml.load("Name: !Foo bar "))
yaml.constructor.ConstructorError: could not determine a constructor for the tag '!Foo' in "<unicode string>", line 1, column 7:
Name: !Foo bar
      ^

I read pyYAML Errors on "!" in a string and this is close to what I need, except that it parses and outputs the custom tag as quoted string, hence it isn't a tag anymore:

>>> def default_ctor(loader, tag_suffix, node):
...   return tag_suffix + ' ' + node.value

>>> yaml.add_multi_constructor('', default_ctor)
>>> yaml.dump(yaml.load("Name: !Foo bar "), default_flow_style=False)
"Name: '!Foo bar'\n"

I guess there is not much missing, but what? How can I load a file that contains any tags and dump them afterwards?

like image 506
Jan Avatar asked May 03 '17 16:05

Jan


1 Answers

Since default_ctor() returns a string (which is just a concatenation of the tag and the scalar), that is what is being dumped. And because the tag starts with ! dumping that string to a scalar will get you quotes.

If you want to generically preserve the tag and value you need to store those in a special type (and not a "normal" Python string) and provide a representer (i.e. dumping routine) for that type:

import sys
import yaml

yaml_str = """\
Name: !Foo bar
Alt: !Bar foo
"""


class GenericScalar:
    def __init__(self, value, tag, style=None):
        self._value = value
        self._tag = tag
        self._style = style

    @staticmethod
    def to_yaml(dumper, data):
        # data is a GenericScalar
        return dumper.represent_scalar(data._tag, data._value, style=data._style)


def default_constructor(loader, tag_suffix, node):
    if isinstance(node, yaml.ScalarNode):
        return GenericScalar(node.value, tag_suffix, style=node.style)
    else:
        raise NotImplementedError('Node: ' + str(type(node)))


yaml.add_multi_constructor('', default_constructor, Loader=yaml.SafeLoader)

yaml.add_representer(GenericScalar, GenericScalar.to_yaml, Dumper=yaml.SafeDumper)

data = yaml.safe_load(yaml_str)
yaml.safe_dump(data, sys.stdout, default_flow_style=False, allow_unicode=True)

This gives:

Alt: !Bar 'foo'
Name: !Foo 'bar'

Notes:

  • It is unsafe to use PyYAML's load(). Don't use it, it is not necessary (as my code shows). What makes it worse is that there is no feedback from PyYAML that there is any danger.
  • PyYAML dumps all scalars that have a tag with quotes, even if you preserve the node style as I do (or force to the empty string). To prevent that from happening you will have to dig pretty deep in the serialisation of nodes. I have been working on a fix for this in my ruamel.yaml package as the quotes are very often not necessary.
  • Your anchors and aliases don't get resolved. It is just that PyYAML is not smart enough to do anything but expand the merge key at load time. If you have a normal self-reference in your YAML, you'll get an anchor and alias in your dumped YAML.
  • The above will nicely raise an error if your node, after the tag, is anything but a scalar (i.e. a mapping or a sequence). It is possible to load/dump those generically as well. by just adding some types and extending the default_constructor with some elif isinstance(node, yaml.MappingNode) and elif isinstance(node, yaml.SequenceNode). I would make those create different types (that behave like a dict resp. list), and if you go that route you should be aware that constructing those will need to happen in a two-step process (yield the constructed object, then get the sub-node values and fill the object), otherwise you cannot use self-referential structures (i.e. aliases within the node).
  • PyYAML doesn't preserve the order of the elements in the mapping
  • You can have a tag !CustomTag: that ends in a colon, but I find it not so human friendly to read !CustomTag: foo, as that looks very much like a key-value pair in a block style mapping.
like image 185
Anthon Avatar answered Sep 20 '22 21:09

Anthon