I'd like to be able to dump a dictionary containing long strings that I'd like to have in the block style for readability. For example:
foo: | this is a block literal bar: > this is a folded block
PyYAML supports the loading of documents with this style but I can't seem to find a way to dump documents this way. Am I missing something?
YAML (YAML Ain't Markup Language) is a human-readable data-serialization language. It is commonly used for configuration files, but it is also used in data storage (e.g. debugging output) or transmission (e.g. document headers).
However, Python lacks built-in support for the YAML data format, commonly used for configuration and serialization, despite clear similarities between the two languages.
import yaml class folded_unicode(unicode): pass class literal_unicode(unicode): pass def folded_unicode_representer(dumper, data): return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>') def literal_unicode_representer(dumper, data): return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|') yaml.add_representer(folded_unicode, folded_unicode_representer) yaml.add_representer(literal_unicode, literal_unicode_representer) data = { 'literal':literal_unicode( u'by hjw ___\n' ' __ /.-.\\\n' ' / )_____________\\\\ Y\n' ' /_ /=== == === === =\\ _\\_\n' '( /)=== == === === == Y \\\n' ' `-------------------( o )\n' ' \\___/\n'), 'folded': folded_unicode( u'It removes all ordinary curses from all equipped items. ' 'Heavy or permanent curses are unaffected.\n')} print yaml.dump(data)
The result:
folded: > It removes all ordinary curses from all equipped items. Heavy or permanent curses are unaffected. literal: | by hjw ___ __ /.-.\ / )_____________\\ Y /_ /=== == === === =\ _\_ ( /)=== == === === == Y \ `-------------------( o ) \___/
For completeness, one should also have str implementations, but I'm going to be lazy :-)
pyyaml
does support dumping literal or folded blocks.
Representer.add_representer
defining types:
class folded_str(str): pass class literal_str(str): pass class folded_unicode(unicode): pass class literal_unicode(str): pass
Then you can define the representers for those types. Please note that while Gary's solution works great for unicode, you may need some more work to get strings to work right (see implementation of represent_str).
def change_style(style, representer): def new_representer(dumper, data): scalar = representer(dumper, data) scalar.style = style return scalar return new_representer import yaml from yaml.representer import SafeRepresenter # represent_str does handle some corner cases, so use that # instead of calling represent_scalar directly represent_folded_str = change_style('>', SafeRepresenter.represent_str) represent_literal_str = change_style('|', SafeRepresenter.represent_str) represent_folded_unicode = change_style('>', SafeRepresenter.represent_unicode) represent_literal_unicode = change_style('|', SafeRepresenter.represent_unicode)
Then you can add those representers to the default dumper:
yaml.add_representer(folded_str, represent_folded_str) yaml.add_representer(literal_str, represent_literal_str) yaml.add_representer(folded_unicode, represent_folded_unicode) yaml.add_representer(literal_unicode, represent_literal_unicode)
... and test it:
data = { 'foo': literal_str('this is a\nblock literal'), 'bar': folded_unicode('this is a folded block'), } print yaml.dump(data)
result:
bar: >- this is a folded block foo: |- this is a block literal
default_style
If you are interested in having all your strings follow a default style, you can also use the default_style
keyword argument, e.g:
>>> data = { 'foo': 'line1\nline2\nline3' } >>> print yaml.dump(data, default_style='|') "foo": |- line1 line2 line3
or for folded literals:
>>> print yaml.dump(data, default_style='>') "foo": >- line1 line2 line3
or for double-quoted literals:
>>> print yaml.dump(data, default_style='"') "foo": "line1\nline2\nline3"
Here is an example of something you may not expect:
data = { 'foo': literal_str('this is a\nblock literal'), 'bar': folded_unicode('this is a folded block'), 'non-printable': literal_unicode('this has a \t tab in it'), 'leading': literal_unicode(' with leading white spaces'), 'trailing': literal_unicode('with trailing white spaces '), } print yaml.dump(data)
results in:
bar: >- this is a folded block foo: |- this is a block literal leading: |2- with leading white spaces non-printable: "this has a \t tab in it" trailing: "with trailing white spaces "
See the YAML spec for escaped characters (Section 5.7):
Note that escape sequences are only interpreted in double-quoted scalars. In all other scalar styles, the “\” character has no special meaning and non-printable characters are not available.
If you want to preserve non-printable characters (e.g. TAB), you need to use double-quoted scalars. If you are able to dump a scalar with literal style, and there is a non-printable character (e.g. TAB) in there, your YAML dumper is non-compliant.
E.g. pyyaml
detects the non-printable character \t
and uses the double-quoted style even though a default style is specified:
>>> data = { 'foo': 'line1\nline2\n\tline3' } >>> print yaml.dump(data, default_style='"') "foo": "line1\nline2\n\tline3" >>> print yaml.dump(data, default_style='>') "foo": "line1\nline2\n\tline3" >>> print yaml.dump(data, default_style='|') "foo": "line1\nline2\n\tline3"
Another bit of useful information in the spec is:
All leading and trailing white space characters are excluded from the content
This means that if your string does have leading or trailing white space, these would not be preserved in scalar styles other than double-quoted. As a consequence, pyyaml
tries to detect what is in your scalar and may force the double-quoted style.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With