Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python parsing class from YAML

Tags:

I am trying to output and then to parse back from YAML the following

import numpy as np
class MyClass(object):
    YAMLTag = '!MyClass'

    def __init__(self, name, times, zeros):
        self.name   = name
        self._T     = np.array(times)
        self._zeros = np.array(zeros)

The YAML file looks like

!MyClass:
  name: InstanceId
  times: [0.0, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0]
  zeros: [0.03, 0.03, 0.04, 0.03, 0.03, 0.02, 0.03]

To write, I have added to the class two methods

def toDict(self):
    return {'name'  : self.name,
            'times' : [float(t) for t in self._T],
            'zeros' : [float(t) for t in self._zeros]}
@staticmethod
def ToYAML(dumper, data):
    return dumper.represent_dict({data.YAMLTag : data.toDict()})

and to read, the method

@staticmethod
def FromYAML(loader, node):
    nodeMap = loader.construct_mapping(node)
    return MyClass(name  = nodeMap['name'],
                   times = nodeMap['times'],
                   zeros = nodeMap['zeros'])

and following YAML Documentation, I added the following snippet in the same Python file myClass.py:

import yaml

yaml.add_constructor(MyClass.YAMLTag, MyClass.FromYAML)
yaml.add_representer(MyClass,         MyClass.ToYAML)

Now, the writing seems to work ok, but reading the YAML, the code

loader.construct_mapping(node)

seems to return the dictionary with empty data:

{'zeros': [], 'name': 'InstanceId', 'times': []}

How should I fix the reader to be able to do this properly? Or perhaps I am not writing something out right? I spent a long time looking at PyYAML documentation and debugging through how the package is implemented but cannot figure out a way to parse out a complicated structure, and the only example I seemed to find has a 1-line class which parses out easily.


Related: YAML parsing and Python


UPDATE

Manually parsing the node as follows worked:

name, times, zeros = None, None, None
for key, value in node.value:
    elementName = loader.construct_scalar(key)
    if elementName == 'name':
        name = loader.construct_scalar(value)
    elif elementName == 'times':
        times = loader.construct_sequence(value)
    elif elementName == 'zeros':
        zeros = loader.construct_sequence(value)
    else:
        raise ValueError('Unexpected YAML key %s' % elementName)

But the question still stands, is there a non-manual way to do this?

like image 856
gt6989b Avatar asked Mar 23 '18 15:03

gt6989b


People also ask

How can I parse a YAML file in Python?

We can read the YAML file using the PyYAML module's yaml. load() function. This function parse and converts a YAML object to a Python dictionary ( dict object). This process is known as Deserializing YAML into a Python.

Is PyYAML same as YAML?

YAML is a data serialization format designed for human readability and interaction with scripting languages. PyYAML is a YAML parser and emitter for the Python programming language.

Does Python have a built in YAML parser?

However, Python lacks built-in support for the YAML data format, commonly used for configuration and serialization, despite clear similarities between the two languages.

What is YAML dump?

Dumping YAML dump accepts the second optional argument, which must be an open text or binary file. In this case, yaml. dump will write the produced YAML document into the file. Otherwise, yaml. dump returns the produced document.


2 Answers

There are multiple problems with your approach, even not taking into account that you should read PEP 8, the style guide for Python code, in particular the part on Method Names and Instance Variables

  1. As you indicate you have looked long at the Python documentation, you cannot have failed to notice that yaml.load() is unsafe. It is also is almost never necessary to use it, certainly not if you write your own representers and constructors.

  2. You use dumper.represent_dict({data.YAMLTag : data.toDict()}) which dumps an object as a key-value pair. What you want to do, at least if you want to have a tag in your output YAML is: dumper.represent_mapping(data.YAMLTag, data.toDict()). This will get you output of the form:

    !MyClass
    name: InstanceId
    times: [0.0, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0]
    zeros: [0.03, 0.03, 0.04, 0.03, 0.03, 0.02, 0.03]
    

    i.e. a tagged mapping instead of your key-value pair, where the value is a mapping. (And I would have expected the first line to be '!MyClass': to make sure the scalar that starts with an exclamation mark is not interpreted as a tag).

  3. Constructing a complex object, that are potentially self-referential (directly or indirectly) has to be done in two steps using a generator (the PyYAML code calls this in the correct way for you). In your code you assume that you have all the parameters to create an instance of MyClass. But if there is self-reference, these parameters have to include that instance itself and it is not created yet. The proper example code in the YAML code base for this is construct_yaml_object() in constructor.py:

    def construct_yaml_object(self, node, cls):
        data = cls.__new__(cls)
        yield data
        if hasattr(data, '__setstate__'):
            state = self.construct_mapping(node, deep=True)
            data.__setstate__(state)
        else:
            state = self.construct_mapping(node)
            data.__dict__.update(state)
    

    You don't have to use .__new__(), but you should take deep=True into account as explained here

In general it also is useful to have a __repr__() that allows you to check the object that you load, with something more expressive than <__main__.MyClass object at 0x12345>

The imports:

from __future__ import print_function

import sys
import yaml
from cStringIO import StringIO
import numpy as np

To check the correct workings of self-referential versions I added the self._ref attribute to the class:

class MyClass(object):
    YAMLTag = u'!MyClass'

    def __init__(self, name=None, times=[], zeros=[], ref=None):
        self.update(name, times, zeros, ref)

    def update(self, name, times, zeros, ref):
        self.name = name
        self._T = np.array(times)
        self._zeros = np.array(zeros)
        self._ref = ref

    def toDict(self):
        return dict(name=self.name,
                    times=self._T.tolist(),
                    zeros=self._zeros.tolist(),
                    ref=self._ref,
        )

    def __repr__(self):
        return "{}(name={}, times={}, zeros={})".format(
            self.__class__.__name__,
            self.name,
            self._T.tolist(),
            self._zeros.tolist(),
        )

    def update_self_ref(self, ref):
        self._ref = ref

The representer and constructor "methods":

    @staticmethod
    def to_yaml(dumper, data):
        return dumper.represent_mapping(data.YAMLTag, data.toDict())

    @staticmethod
    def from_yaml(loader, node):
        value = MyClass()
        yield value
        node_map = loader.construct_mapping(node, deep=True)
        value.update(**node_map)


yaml.add_representer(MyClass, MyClass.to_yaml, Dumper=yaml.SafeDumper)
yaml.add_constructor(MyClass.YAMLTag, MyClass.from_yaml, Loader=yaml.SafeLoader)

And how to use it:

instance = MyClass('InstanceId',
                   [0.0, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0],
                   [0.03, 0.03, 0.04, 0.03, 0.03, 0.02, 0.03])
instance.update_self_ref(instance)

buf = StringIO()
yaml.safe_dump(instance, buf)

yaml_str = buf.getvalue()
print(yaml_str)


data = yaml.safe_load(yaml_str)
print(data)
print(id(data), id(data._ref))

the above combined gives:

&id001 !MyClass
name: InstanceId
ref: *id001
times: [0.0, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0]
zeros: [0.03, 0.03, 0.04, 0.03, 0.03, 0.02, 0.03]

MyClass(name=InstanceId, times=[0.0, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0], zeros=[0.03, 0.03, 0.04, 0.03, 0.03, 0.02, 0.03]) 
139737236881744 139737236881744

As you can see the ids of data and data._ref are the same after loading.

The above throws an error if you use the simplistic approach in your constructor, by just using loader.construct_mapping(node, deep=True)

like image 107
Anthon Avatar answered Sep 21 '22 13:09

Anthon


Instead of

nodeMap = loader.construct_mapping(node)

try this:

nodeMap = loader.construct_mapping(node, deep=True)

Also, you have a little mistake in your YAML file:

!MyClass:

The colon at the end does not belong there.

like image 31
tinita Avatar answered Sep 21 '22 13:09

tinita