Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating anchors with PyYAML.dump()?

I'd like to be able to generate anchors in the YAML generated by PyYAML's dump() function. Is there a way to do this? Ideally the anchors would have the same name as the YAML nodes.

Example:

import yaml
yaml.dump({'a': [1,2,3]})
'a: [1, 2, 3]\n'

What I'd like to be able to do is generate YAML like:

import yaml
yaml.dump({'a': [1,2,3]})
'a: &a [1, 2, 3]\n'

Can I write a custom emitter or dumper to do this? Is there another way?

like image 527
Geoff Lawler Avatar asked Aug 05 '13 18:08

Geoff Lawler


People also ask

What is PyYAML in Python?

PyYAML is a YAML parser and emitter for Python. PyYAML features a complete YAML 1.1 parser, Unicode support, pickle support, capable extension API, and sensible error messages. PyYAML supports standard YAML tags and provides Python-specific tags that allow to represent an arbitrary Python object.

What is * id001?

&id001 - example of an anchor, placed with the first occurrence of data. *id001 - example of an alias, replaces subsequent occurrence of data.

What does YAML dump do?

yaml. dump(data) produces the document as a UTF-8 encoded str object. yaml. dump(data, encoding=('utf-8'|'utf-16-be'|'utf-16-le')) produces a str object in the specified encoding.

Can PyYAML parse JSON?

It is often used for configuration files, but can also be used for data exchange. The most used python YAML parser is PyYAML, a library that allows you to load, parse, and write YAML, much like Python's JSON library helps you to work with JSON.


2 Answers

By default, anchors are only emitted when it detects a reference to an object previously seen:

>>> import yaml
>>>
>>> foo = {'a': [1,2,3]}
>>> doc = (foo,foo)
>>>
>>> print yaml.safe_dump(doc, default_flow_style=False)
- &id001
  a:
  - 1
  - 2
  - 3
- *id001

If you want to override how it is named, you'll have to customize the Dumper class, specifically the generate_anchor() function. ANCHOR_TEMPLATE may also be useful.

In your example, the node name is simple, but you need to take into account the many possibilities for YAML values, ie it could be a sequence rather than a single value:

>>> import yaml
>>>
>>> foo = {('a', 'b', 'c'): [1,2,3]}
>>> doc = (foo,foo)
>>>
>>> print yaml.dump(doc, default_flow_style=False)
!!python/tuple
- &id001
  ? !!python/tuple
  - a
  - b
  - c
  : - 1
    - 2
    - 3
- *id001
like image 189
AlexH Avatar answered Oct 23 '22 05:10

AlexH


This is not so easy. Unless the data that you want to use for the anchor is inside the node. This is because the anchor gets attached to the node contents, in your example '[1,2,3]' and doesn't know that this value is associated with key 'a'.

l = [1, 2, 3]
foo = {'a': l, 'b': l}
class SpecialAnchor(yaml.Dumper):

    def generate_anchor(self, node):
        print('Generating anchor for {}'.format(str(node)))
        anchor =  super().generate_anchor(node)
        print('Generated "{}"'.format(anchor))
        return anchor

y1 = yaml.dump(foo, Dumper=Anchor)

Gives you:

Generating anchor for SequenceNode(tag='tag:yaml.org,2002:seq', value=[ScalarNode(tag='tag:yaml.org,2002:int', value='1'), ScalarNode(tag='tag:yaml.org,2002:int', value='2'), ScalarNode(tag='tag:yaml.org,2002:int', value='3')])
Generated "id001"
a: &id001 [1, 2, 3]
b: *id001

So far I haven't found a way to get the key 'a' given the node...

like image 2
Eliot Blennerhassett Avatar answered Oct 23 '22 03:10

Eliot Blennerhassett