Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Formatting PyYAML dump() output

Tags:

I have a list of dictionaries, which I want to serialize:

list_of_dicts = [ { 'key_1': 'value_a', 'key_2': 'value_b'},                   { 'key_1': 'value_c', 'key_2': 'value_d'},                   ...                   { 'key_1': 'value_x', 'key_2': 'value_y'}  ]  yaml.dump(list_of_dicts, file, default_flow_style = False) 

produces the following:

- key_1: value_a   key_2: value_b - key_1: value_c   key_2: value_d (...) - key_1: value_x   key_2: value_y 

But i'd like to get this:

- key_1: value_a   key_2: value_b                      <-| - key_1: value_c       |    key_2: value_d       |  empty lines between blocks (...)                  |                      <-| - key_1: value_x   key_2: value_y 

PyYAML documentation talks about dump() arguments very briefly and doesn't seem to have anything on this particular subject.

Editing the file manually to add newlines improves readability quite a lot, and the structure still loads just fine afterwards, but I have no idea how to make dump method generate it.

And in general, is there a way to have more control over output formatting besides simple indentation?

like image 291
nope Avatar asked Jan 09 '13 05:01

nope


People also ask

What does YAML dump return?

dump will write the produced YAML document into the file. Otherwise, yaml. dump returns the produced document.

Can PyYAML parse JSON?

Technically YAML is a superset of JSON. This means that, in theory at least, a YAML parser can understand JSON, but not necessarily the other way around.

What is PyYAML in Python?

PyYAML is a YAML parser and emitter for Python. PyYAML features a complete YAML 1.1 parser, Unicode support, pickle support, capable extension API, and sensible error messages. PyYAML supports standard YAML tags and provides Python-specific tags that allow to represent an arbitrary Python object.


2 Answers

There's no easy way to do this with the library (Node objects in yaml dumper syntax tree are passive and can't emit this info), so I ended up with

stream = yaml.dump(list_of_dicts, default_flow_style = False) file.write(stream.replace('\n- ', '\n\n- ')) 
like image 63
Yuri Baburov Avatar answered Oct 02 '22 00:10

Yuri Baburov


PyYAML documentation only talks about dump() arguments briefly, because there is not much to say. This kind of control is not provided by PyYAML.

To allow preservation of such empty (and comment) lines in YAML that is loaded, I started the development of the ruamel.yaml library, a superset of the stalled PyYAML, with YAML 1.2 compatibility, many features added and bugs fixed. With ruamel.yaml you can do:

import sys import ruamel.yaml  yaml_str = """\ - key_1: value_a   key_2: value_b  - key_1: value_c   key_2: value_d  - key_1: value_x  # a few before this were ellipsed   key_2: value_y """  yaml = ruamel.yaml.YAML() data = yaml.load(yaml_str) yaml.dump(data, sys.stdout) 

and get the output exactly the same as the input string (including the comment).

You can also build the output that you want from scratch:

import sys import ruamel.yaml  yaml = ruamel.yaml.YAML() list_of_dicts = yaml.seq([ { 'key_1': 'value_a', 'key_2': 'value_b'},                            { 'key_1': 'value_c', 'key_2': 'value_d'},                            { 'key_1': 'value_x', 'key_2': 'value_y'}  ])  for idx in range(1, len(list_of_dicts)):     list_of_dicts.yaml_set_comment_before_after_key(idx, before='\n')  ruamel.yaml.comments.dump_comments(list_of_dicts) yaml.dump(list_of_dicts, sys.stdout) 

The conversion using yaml.seq() is necessary to create an object that allows attachment of the empty-lines through special attributes.

The library also allows preservation/easy-setting of quotes and literal style on strings, format of int (hex, octal, binary) and floats. As well as separate indent specification for mappings and sequences (although not for individual mappings or sequences).

like image 39
Anthon Avatar answered Oct 02 '22 02:10

Anthon