Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

YAML - Dumping a nested object without types/tags

I'm trying to dump some Python objects out into YAML.

Currently, regardless of YAML library (pyyaml, oyaml, or ruamel) I'm having an issue where calling .dump(MyObject) gives me correct YAML, but seems to add a lot of metadata about the Python objects that I don't want, in a form that looks like:

!!python/object:MyObject and other similar strings.

I do not need to be able to rebuild the objects from the YAML, so I am fine for this metadata to be removed completely

Other questions on SO indicate that the common solution to this is to use safe_dump instead of dump.

However, safe_dump does not seem to work for nested objects (or objects at all), as it throws this error:

yaml.representer.RepresenterError: ('cannot represent an object', MyObject)

I see that the common workaround here is to manually specify Representers for the objects that I am trying to dump. My issue here is that my Objects are generated code that I don't have control over. I will also be dumping a variety of different objects.

Bottom line: Is there a way to dump nested objects using .dump, but where the metadata isn't added?

like image 562
Azarantara Avatar asked Apr 24 '19 09:04

Azarantara


2 Answers

Although the words "correct YAML" are not really accurate, and would be better phrased as "YAML output looking like you want it, except for the tag information", this fortunately gives some information on how you want your YAML to look, as there are an infinite number of ways to dump objects.

If you dump an object using ruamel.yaml:

import sys
import ruamel.yaml

class MyObject:
   def __init__(self, a, b):
      self.a = a
      self.b = b
      self.c = [a, b]

data = dict(x=MyObject(42, -1))


yaml = ruamel.yaml.YAML(typ='unsafe')
yaml.dump(data, sys.stdout)

this gives:

x: !!python/object:__main__.MyObject
  a: 42
  b: -1
  c: [42, -1]

You have a tag !!python/object:__main__.MyObject (yours might differ depending on where the class is defined, etc.) and each of the attributes of the class are dumped as keys of a mapping.

There are multiple ways on how to get rid of the tag in that dump:

Registering classes

Add a classmethod named to_yaml(), to each of your classes and register those classes. You have to do this for each of your classes, but doing so allows you to use the safe-dumper. An example on how to do this can be found in the documentation

Post-process

It is fairly easy to postprocess the output and remove the tags, which for objects always occur on the line before the mapping, and you can delete from !!python until the end-of-line

def strip_python_tags(s):
    result = []
    for line in s.splitlines():
        idx = line.find("!!python/")
        if idx > -1:
            line = line[:idx]
        result.append(line)
    return '\n'.join(result)

yaml.encoding = None
yaml.dump(data, sys.stdout, transform=strip_python_tags)

and that gives:

x: 
  a: 42
  b: -1
  c: [42, -1]

As achors are dumped before the tag, this "stripping from !!python until end-of-the line", also works when you dump object that have multiple references.

Change the dumper

You can also change the unsafe dumper routine for mappings to recognise the tag used for objects and change the tag to the "normal" one for dict/mapping (for which normally a tag is not output )

yaml.representer.org_represent_mapping = yaml.representer.represent_mapping

def my_represent_mapping(tag, mapping, flow_style=None):
    if tag.startswith("tag:yaml.org,2002:python/object"):
        tag = u'tag:yaml.org,2002:map'
    return yaml.representer.org_represent_mapping(tag, mapping, flow_style=flow_style)

yaml.representer.represent_mapping = my_represent_mapping

yaml.dump(data, sys.stdout)

and that gives once more:

x:
  a: 42
  b: -1
  c: [42, -1]

These last two methods work for all instances of all Python classes that you define without extra work.

like image 127
Anthon Avatar answered Oct 21 '22 21:10

Anthon


Fast and hacky:

"\n".join([re.sub(r" ?!!python/.*$", "", l) for l in yaml.dump(obj).splitlines()]

  • "\n".join(...) – concat list to string agin
  • yaml.dump(obj).splitlines() – create list of lines of yaml
  • re.sub(r" ?!!python/.*$", "", l) – replace all yaml python tags with empty string
like image 36
Dmitry Erokhin Avatar answered Oct 21 '22 20:10

Dmitry Erokhin