Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to construct an object using PyYAML construct_mapping after all nodes complete loading?

I am trying to make a yaml sequence in python that creates a custom python object. The object needs to be constructed with dicts and lists that are deconstructed after __init__. However, it seems that the construct_mapping function does not construct the entire tree of embedded sequences (lists) and dicts.
Consider the following:

import yaml

class Foo(object):
    def __init__(self, s, l=None, d=None):
        self.s = s
        self.l = l
        self.d = d

def foo_constructor(loader, node):
    values = loader.construct_mapping(node)
    s = values["s"]
    d = values["d"]
    l = values["l"]
    return Foo(s, d, l)
yaml.add_constructor(u'!Foo', foo_constructor)

f = yaml.load('''
--- !Foo
s: 1
l: [1, 2]
d: {try: this}''')

print(f)
# prints: 'Foo(1, {'try': 'this'}, [1, 2])'

This works fine because f holds the references to the l and d objects, which are actually filled with data after the Foo object is created.

Now, let's do something a smidgen more complicated:

class Foo(object):
    def __init__(self, s, l=None, d=None):
        self.s = s
        # assume two-value list for l
        self.l1, self.l2 = l
        self.d = d

Now we get the following error

Traceback (most recent call last):
  File "test.py", line 27, in <module>
    d: {try: this}''')
  File "/opt/homebrew/lib/python2.7/site-packages/yaml/__init__.py", line 71, in load
    return loader.get_single_data()
  File "/opt/homebrew/lib/python2.7/site-packages/yaml/constructor.py", line 39, in get_single_data
    return self.construct_document(node)
  File "/opt/homebrew/lib/python2.7/site-packages/yaml/constructor.py", line 43, in construct_document
    data = self.construct_object(node)
  File "/opt/homebrew/lib/python2.7/site-packages/yaml/constructor.py", line 88, in construct_object
    data = constructor(self, node)
  File "test.py", line 19, in foo_constructor
    return Foo(s, d, l)
  File "test.py", line 7, in __init__
    self.l1, self.l2 = l
ValueError: need more than 0 values to unpack

This is because the yaml constructor is starting at the outer layer of nesting before and constructing the object before all nodes are finished. Is there a way to reverse the order and start with deeply embedded (e.g. nested) objects first? Alternatively, is there a way to get construction to happen at least after the node's objects have been loaded?

like image 926
scicalculator Avatar asked Oct 18 '13 00:10

scicalculator


People also ask

Is PyYAML same as YAML?

YAML is a data serialization format designed for human readability and interaction with scripting languages. PyYAML is a YAML parser and emitter for Python. PyYAML features a complete YAML 1.1 parser, Unicode support, pickle support, capable extension API, and sensible error messages.

What is PyYAML used for?

PyYAML allows you to construct a Python object of any type. Even instances of Python classes can be constructed using the !! python/object tag. Note that the ability to construct an arbitrary Python object may be dangerous if you receive a YAML document from an untrusted source such as the Internet.


2 Answers

Well, what do you know. The solution I found was so simple, yet not so well documented.

The Loader class documentation clearly shows the construct_mapping method only takes in a single parameter (node). However, after considering writing my own constructor, I checked out the source, and the answer was right there! The method also takes in a parameter deep (default False).

def construct_mapping(self, node, deep=False):
    #...

So, the correct constructor method to use is

def foo_constructor(loader, node):
    values = loader.construct_mapping(node, deep=True)
    #...

I guess PyYaml could use some additional documentation, but I'm grateful that it already exists.

like image 93
scicalculator Avatar answered Oct 13 '22 23:10

scicalculator


tl;dr:
replace your foo_constructor with the one in the code at the bottom of this answer


There are several problems with your code (and your solution), let's address them step by step.

The code you present will not print what it says in the bottom line comment, ('Foo(1, {'try': 'this'}, [1, 2])') as there is no __str__() defined for Foo, it prints something like:

__main__.Foo object at 0x7fa9e78ce850

This is easily remedied by adding the following method to Foo:

    def __str__(self):
        # print scalar, dict and list
        return('Foo({s}, {d}, {l})'.format(**self.__dict__))

and if you then look at the output:

Foo(1, [1, 2], {'try': 'this'})

This is close, but not what you promised in the comment either. The list and the dict are swapped, because in your foo_constructor() you create Foo() with the wrong order of parameters.
This points to a more fundamental problem that your foo_constructor() needs to know to much about the object it is creating. Why is this so? It is not just the parameter order, try:

f = yaml.load('''
--- !Foo
s: 1
l: [1, 2]
''')

print(f)

One would expect this to print Foo(1, None, [1, 2]) (with the default value of the non-specified d keyword argument).
What you get is a KeyError exception on d = value['d'].

You can of use get('d'), etc., in foo_constructor() to solve this, but you have to realise that for correct behaviour you must specify the default values from your Foo.__init__() (which in your case just happen to be all None), for each and every parameter with a default value:

def foo_constructor(loader, node):
    values = loader.construct_mapping(node, deep=True)
    s = values["s"]
    d = values.get("d", None)
    l = values.get("l", None)
    return Foo(s, l, d)

keeping this updated is of course a maintenance nightmare.

So scrap the whole foo_constructor and replace it with something that looks more like how PyYAML does this internally:

def foo_constructor(loader, node):
    instance = Foo.__new__(Foo)
    yield instance
    state = loader.construct_mapping(node, deep=True)
    instance.__init__(**state)

This handles missing (default) parameters and doesn't have to be updated if the defaults for your keyword arguments change.

All of this in a complete example, including a self referential use of the object (always tricky):

class Foo(object):
    def __init__(self, s, l=None, d=None):
        self.s = s
        self.l1, self.l2 = l
        self.d = d

    def __str__(self):
        # print scalar, dict and list
        return('Foo({s}, {d}, [{l1}, {l2}])'.format(**self.__dict__))

def foo_constructor(loader, node):
    instance = Foo.__new__(Foo)
    yield instance
    state = loader.construct_mapping(node, deep=True)
    instance.__init__(**state)

yaml.add_constructor(u'!Foo', foo_constructor)

print(yaml.load('''
--- !Foo
s: 1
l: [1, 2]
d: {try: this}'''))
print(yaml.load('''
--- !Foo
s: 1
l: [1, 2]
'''))
print(yaml.load('''
&fooref
a: !Foo
  s: *fooref
  l: [1, 2]
  d: {try: this}
''')['a'])

gives:

Foo(1, {'try': 'this'}, [1, 2])
Foo(1, None, [1, 2])
Foo({'a': <__main__.Foo object at 0xba9876543210>}, {'try': 'this'}, [1, 2])

This was tested using ruamel.yaml (of which I am the author), which is a enhanced version of PyYAML. The solution should work the same for PyYAML itself.

like image 30
Anthon Avatar answered Oct 13 '22 23:10

Anthon