Using ruamel.yaml I tried to get a YAML in a certain style, more specifically one where single-line strings start at same line as :
and multi-line strings using a folded scalar style (|
/|-
) and lines being limited to a certain amount of characters (word-wrapped).
My attempt so far heavily influenced by a similar function called walk_tree
in the sources:
#!/usr/bin/env python
import ruamel.yaml
from ruamel.yaml.scalarstring import ScalarString, PreservedScalarString
def walk_tree(base):
from ruamel.yaml.compat import string_types
if isinstance(base, dict):
for k in base:
v = base[k]
if isinstance(v, string_types):
v = v.replace('\r\n', '\n').replace('\r', '\n').strip()
base[k] = ScalarString(v) if '\n' in v else v
else:
walk_tree(v)
elif isinstance(base, list):
for idx, elem in enumerate(base):
if isinstance(elem, string_types) and '\n' in elem:
print(elem) # @Anthon: this print is in the original code as well
base[idx] = preserve_literal(elem)
else:
walk_tree(elem)
with open("input.yaml", "r") as fi:
inp = fi.read()
loader=ruamel.yaml.RoundTripLoader
data = ruamel.yaml.load(inp, loader)
walk_tree(data)
dumper = ruamel.yaml.RoundTripDumper
with open("output.yaml", "w") as fo:
ruamel.yaml.dump(data, fo, Dumper=dumper, allow_unicode=True)
But then I get an exception: ruamel.yaml.representer.RepresenterError: cannot represent an object: …
. I get no exception if I replace ScalarString
with PreservedScalarString
as is the case in the original walk_tree
code but then I get the literal blocks again which is not what I want.
So how can my code be fixed so that it will work?
The class ScalarString
is a base class for LiteralScalarString
, it has no representer as you found out. You should just make/keep this a Python string, as that deals with special characters appropriately (quoting strings that need to be quoted to conform to the YAML specification).
Assuming you have input like this:
- 1
- abc: |
this is a short string scalar with a newline
in it
- "there are also a multiline\nsequence element\nin this file\nand it is longer"
You probably want to do something like:
import ruamel.yaml
from ruamel.yaml.scalarstring import LiteralScalarString, preserve_literal
def walk_tree(base):
from ruamel.yaml.compat import string_types
def test_wrap(v):
v = v.replace('\r\n', '\n').replace('\r', '\n').strip()
return v if len(v) < 72 else preserve_literal(v)
if isinstance(base, dict):
for k in base:
v = base[k]
if isinstance(v, string_types) and '\n' in v:
base[k] = test_wrap(v)
else:
walk_tree(v)
elif isinstance(base, list):
for idx, elem in enumerate(base):
if isinstance(elem, string_types) and '\n' in elem:
base[idx] = test_wrap(elem)
else:
walk_tree(elem)
yaml = YAML()
with open("input.yaml", "r") as fi:
data = yaml.load(fi)
walk_tree(data)
with open("output.yaml", "w") as fo:
yaml.dump(data, fo)
to get output:
- 1
- abc: "this is a short string scalar with a newline\nin it"
- |-
there are also a multiline
sequence element
in this file
and it is longer
Some notes:
LiteralScalarString
is preferred over PreservedScalarString
. The latter name a remnant from the time it was the only preserved string type.preserve_literal
, although it was still used in the copied code.data[1]['abc']
loads as LiteralScalarString
. If you want to preserve existing literal style string scalars, you should test for those before testing on type string_types
.YAML()
width
attribute to something like 1000, to prevent automatic line wrapping, if you increase 72 in the example to above the default of 80. (yaml.width = 1000
)If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With