With this data structure:
d = {
(2,3,4): {
'a': [1,2],
'b': 'Hello World!',
'c': 'Voilà!'
}
}
I would like to get this YAML:
%YAML 1.2
---
[2,3,4]:
a:
- 1
- 2
b: Hello World!
c: 'Voilà!'
Unfortunately I get this format:
$ print ruamel.yaml.dump(d, default_flow_style=False, line_break=1, explicit_start=True, version=(1,2))
%YAML 1.2
---
? !!python/tuple
- 2
- 3
- 4
: a:
- 1
- 2
b: Hello World!
c: !!python/str 'Voilà!'
I cannot configure the output I want even with safe_dump
. How can I do that without manual regex work on the output?
The only ugly solution I found is something like:
def rep(x):
return repr([int(y) for y in re.findall('^\??\s*-\s*(\d+)', x.group(0), re.M)]) + ":\n"
print re.sub('\?(\s*-\s*(\w+))+\s*:', rep,
ruamel.yaml.dump(d, default_flow_style=False, line_break=1, explicit_start=True, version=(1,2)))
ruamel.yaml is a YAML parser/emitter that supports roundtrip preservation of comments, seq/map flow style, and map key order.
if you specify yaml. indent(sequence=4) (indentation is counted to the beginning of the sequence element). You can use mapping=4 to also have the mappings values indented. The dump also observes an additional offset=2 setting that can be used to push the dash inwards, within the space defined by sequence .
You cannot get what you want using ruamel.yaml.dump()
, but with the new API, which has
a few more controls, you can come very close.
import sys
import ruamel.yaml
d = {
(2,3,4): {
'a': [1,2],
'b': 'Hello World!',
'c': 'Voilà!'
}
}
def prep(d):
if isinstance(d, dict):
needs_restocking = False
for idx, k in enumerate(d):
if isinstance(k, tuple):
needs_restocking = True
try:
if 'à' in d[k]:
d[k] = ruamel.yaml.scalarstring.SingleQuotedScalarString(d[k])
except TypeError:
pass
prep(d[k])
if not needs_restocking:
return
items = list(d.items())
for (k, v) in items:
d.pop(k)
for (k, v) in items:
if isinstance(k, tuple):
k = ruamel.yaml.comments.CommentedKeySeq(k)
d[k] = v
elif isinstance(d, list):
for item in d:
prep(item)
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=2, sequence=4, offset=2)
yaml.version = (1, 2)
data = prep(d)
yaml.dump(d, sys.stdout)
which gives:
%YAML 1.2
---
[2, 3, 4]:
a:
- 1
- 2
b: Hello World!
c: 'Voilà!'
There is still no simple way to suppress the space before the sequence items, so you cannot get [2,3,4]
insted of [2, 3, 4]
without some major effort.
You cannot get exactly what you want as output using ruamel.yaml.dump()
without major rework of the internals.
a
, b
, etc) and indentation 4 for the elements of the sequence that is the value for the a
key (with the -
pushed in 2 positions. That would at least require differencing between indentation levels for mapping and sequences (if not for individual collections) and that is non-trivial.,
(comma, space) what a "normal" flow style emits to just a ,
. IIRC this cannot currently be influenced by any parameter, and since you have little contextual knowledge when emitting a collection, it is difficult to "not include the spaces when emitting a sequence that is a key". An additional option to dump()
would require changes in several of the sources files and classes.Less difficult issues, with indication of solution:
!!python/tuple
. As you don't want to affect all tuples, this is IMO best done by making a subclass of tuple
and represent this as a sequence (optionally represent such tuple as list only if actually used as a key). You can use comments.CommentedKeySeq
for that (assuming ruamel.yaml>=0.12.14
, it has the proper representation support when using ruamel.yaml.round_trip_dump()
SequenceStartEvent
starts a simple key (if it has flow style and not block style). An additional issue is that such a SequenceStartEvent then will be "tested" to have a style
attribute (which might indicate an explicit need for '?' on key). This requires changing emitter.py:Emitter.check_simple_key()
and emitter.py:Emitter.expect_block_mapping_key()
.c
gets quotes, whereas your scalar string value for b
doesn't. You only can get that kind of difference in output in ruamel.yaml by making them different types. E.g. by making it type scalarstring.SingleQuotedScalarString()
(and using round_trip_dump()
).If you do:
import sys
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap, CommentedKeySeq
assert ruamel.yaml.version_info >= (0, 12, 14)
data = CommentedMap()
data[CommentedKeySeq((2, 3, 4))] = cm = CommentedMap()
cm['a'] = [1, 2]
cm['b'] = 'Hello World!'
cm['c'] = ruamel.yaml.scalarstring.SingleQuotedScalarString('Voilà!')
ruamel.yaml.round_trip_dump(data, sys.stdout, explicit_start=True, version=(1, 2))
you will get:
%YAML 1.2
---
[2, 3, 4]:
a:
- 1
- 2
b: Hello World!
c: 'Voilà!'
which, apart from the now consistent indentation level of 2, the extra spaces in the flow style sequence, and the required use of the round_trip_dump
, will get you as close to what you want without major rework.
Whether the above code is ugly as well or not is of course a matter of taste.
The output will, non-incidently, round-trip correctly when loaded using ruamel.yaml.round_trip_load(preserve_quotes=True)
.
If control over the quotes is not needed, and neither is the order of your mapping keys important, then you can also patch the normal dumper:
def my_key_repr(self, data):
if isinstance(data, tuple):
print('data', data)
return self.represent_sequence(u'tag:yaml.org,2002:seq', data,
flow_style=True)
return ruamel.yaml.representer.SafeRepresenter.represent_key(self, data)
ruamel.yaml.representer.Representer.represent_key = my_key_repr
Then you can use a normal sequence:
data = {}
data[(2, 3, 4)] = cm = {}
cm['a'] = [1, 2]
cm['b'] = 'Hello World!'
cm['c'] = 'Voilà!'
ruamel.yaml.dump(data, sys.stdout, allow_unicode=True, explicit_start=True, version=(1, 2))
will give you:
%YAML 1.2
---
[2, 3, 4]:
a: [1, 2]
b: Hello World!
c: Voilà!
please note that you need to explicitly allow unicode in your output (default with round_trip_dump()
) using allow_unicode=True
.
¹ Disclaimer: I am the author of ruamel.yaml.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With