In the past, I did something like <code>some_fancy_printing_loggin_func(yaml.dump(...), ...)</code>, using the backward-compatible part of ruamel.yaml, but I want to convert my code to use the latest API so that I can take advantage of some of the new formatting settings. However, I hate that I have to specify a stream to <code>ruamel.yaml.YAML.dump()</code> ... I don't want it to write directly to a stream; I just want it to return the output to the caller. What am I missing? PS: I know I can do something like the following, though of course I'm trying to avoid it. <pre class="prettyprint lang-py prettyprint-override"><code>f = io.StringIO() yml.dump(myobj, f) f.seek(0) my_logging_func(f.read()) </code></pre>

I am not sure if you really are missing something, if at all it might be that if you're working with streams you should—preferably—continue to work with streams. That is however something many users of ruamel.yaml and PyYAML seem to miss and therefore they do: <pre class="prettyprint"><code>print(dump(data)) </code></pre> instead of <pre class="prettyprint"><code>dump(data, sys.stdout) </code></pre> The former is might be fine for non-realistic data used in the (PyYAML) documentation, but it leads to bad habits for real data. The best solution is to make your <code>my_logging_func()</code> stream oriented. This can e.g. be done as follows: <pre class="prettyprint"><code>import sys import ruamel.yaml data = dict(user='rsaw', question=47614862) class MyLogger: def write(self, s): sys.stdout.write(s.decode('utf-8')) my_logging_func = MyLogger() yml = ruamel.yaml.YAML() yml.dump(data, my_logging_func) </code></pre> which gives: <pre class="prettyprint"><code>user: rsaw question: 47614862 </code></pre> but note that <code>MyLogger.write()</code> gets called multiple times (in this case eight times), and if you need to work on a line at a time, you have to do line buffering. If you really need to process your YAML as <code>bytes</code> or <code>str</code>, you can install the appropriate plugin (<code>ruamel.yaml.bytes</code> resp. <code>ruamel.yaml.string</code> ) and do: <pre class="prettyprint"><code>yaml = ruamel.yaml.YAML(typ=['rt', 'string']) data = dict(abc=42, help=['on', 'its', 'way']) print('retval', yaml.dump_to_string(data)) </code></pre> Or process the result of <code>yaml.dump_to_string(data)</code>, its equivalent yaml.dumps(data)<code>as you see necessary. Replacing</code>string<code>with</code>bytes<code>in the above doesn't decode the UTF-8 stream back to</code>str` but keeps it as bytes.

Best way to use ruamel.yaml to dump YAML to string (NOT to stream)

Tags:

python-3.x

ruamel.yaml

In the past, I did something like some_fancy_printing_loggin_func(yaml.dump(...), ...), using the backward-compatible part of ruamel.yaml, but I want to convert my code to use the latest API so that I can take advantage of some of the new formatting settings.

However, I hate that I have to specify a stream to ruamel.yaml.YAML.dump() ... I don't want it to write directly to a stream; I just want it to return the output to the caller. What am I missing?

PS: I know I can do something like the following, though of course I'm trying to avoid it.

f = io.StringIO()
yml.dump(myobj, f)
f.seek(0)
my_logging_func(f.read())

375

asked Dec 03 '17 03:12

rsaw

2 Answers

This answer (a small wrapper around ruamel.yaml), was put into a pip module here by me after needing this functionality so frequently

TLDR

pip install ez_yaml

import ez_yaml

ez_yaml.to_string(obj=your_object    , options={})

ez_yaml.to_object(file_path=your_path, options={})
ez_yaml.to_object(string=your_string , options={})

ez_yaml.to_file(your_object, file_path=your_path)

Hacky / Copy-Paste Solution to Original Question

def object_to_yaml_str(obj, options=None):
    # 
    # setup yaml part (customize this, probably move it outside this def)
    # 
    import ruamel.yaml
    yaml = ruamel.yaml.YAML()
    yaml.version = (1, 2)
    yaml.indent(mapping=3, sequence=2, offset=0)
    yaml.allow_duplicate_keys = True
    # show null
    def my_represent_none(self, data):
        return self.represent_scalar(u'tag:yaml.org,2002:null', u'null')
    yaml.representer.add_representer(type(None), my_represent_none)
    
    # 
    # the to-string part
    # 
    if options == None: options = {}
    from io import StringIO
    string_stream = StringIO()
    yaml.dump(obj, string_stream, **options)
    output_str = string_stream.getvalue()
    string_stream.close()
    return output_str

Original Answer (if you want to customize the config/options more)

import ruamel.yaml
from io import StringIO
from pathlib import Path

# setup loader (basically options)
yaml = ruamel.yaml.YAML()
yaml.version = (1, 2)
yaml.indent(mapping=3, sequence=2, offset=0)
yaml.allow_duplicate_keys = True
yaml.explicit_start = False
# show null
def my_represent_none(self, data):
    return self.represent_scalar(u'tag:yaml.org,2002:null', u'null')
yaml.representer.add_representer(type(None), my_represent_none)

# o->s
def object_to_yaml_str(obj, options=None):
    if options == None: options = {}
    string_stream = StringIO()
    yaml.dump(obj, string_stream, **options)
    output_str = string_stream.getvalue()
    string_stream.close()
    return output_str

# s->o
def yaml_string_to_object(string, options=None):
    if options == None: options = {}
    return yaml.load(string, **options)

# f->o
def yaml_file_to_object(file_path, options=None):
    if options == None: options = {}
    as_path_object = Path(file_path)
    return yaml.load(as_path_object, **options)

# o->f
def object_to_yaml_file(obj, file_path, options=None):
    if options == None: options = {}
    as_path_object = Path(Path(file_path))
    with as_path_object.open('w') as output_file:
        return yaml.dump(obj, output_file, **options)

# 
# string examples
# 
yaml_string = object_to_yaml_str({ (1,2): "hi" })
print("yaml string:", yaml_string)
obj = yaml_string_to_object(yaml_string)
print("obj from string:", obj)

# 
# file examples
# 
obj = yaml_file_to_object("./thingy.yaml")
print("obj from file:", obj)
object_to_yaml_file(obj, file_path="./thingy2.yaml")
print("saved that to a file")

Rant

I appreciate Mike Night solving the original "I just want it to return the output to the caller", and calling out that Anthon's post fails to answer the question. Which I will do further: Anthon your module is great; round trip is impressive and one of the few ones ever made. But, (this happens often on Stack Overflow) it is not the job of the author to make other people's code runtime-efficient. Explicit tradeoffs are great, an author should help people understand the consequences of their choices. Adding a warning, including "slow" in the name, etc can be very helpful. However, the methods in the ruamel.yaml documentation; creating an entire inherited class, are not "explicit". They are encumbering and obfuscating, making it difficult to perform and time consuming for others to understand what and why that additional code exists.

Many users rightfully do not care about runtime performance. The runtime of my program, without YAML, is 2 weeks. A 500,000 line yaml file is read in seconds. Both the 2 weeks and the few seconds are irrelevant to the project because it they are CPU time and the project is billed purely by man-hours.

The YAML code was already a string object because of other other operations being performed on it. Forcing it into a stream is is actually causing more overhead. Removing the need for the string form of the YAML would involve rewriting several major libraries and potentially months of effort; making streams a highly impractical choice in this situation.

Assuming keeping it as a stream is even possible, and that the project was billed by CPU time instead of man-hours; optimizing a 500,000-line-yaml-file-as-string would be a ≤0.0001% increase in efficiency. The extra hour spent figuring out answer to this question, and the time spent by others understanding the work-around, could have instead been spent on improving the efficiency of one of the c-functions that is being called 100 times a second. So even when we do care about CPU time, the particular method still fails to be a useful choice.

A post ignoring the question while also suggesting users sink potentially large amounts of time rewriting their applications is not an answer. Respect others by assuming they generally know what they are doing and are aware of the alternatives. Then offers of potentially more-efficient methods will be met with appreciation rather than rejection.

[end rant]

146

answered Oct 13 '22 12:10

Jeff Hykin

I am not sure if you really are missing something, if at all it might be that if you're working with streams you should—preferably—continue to work with streams. That is however something many users of ruamel.yaml and PyYAML seem to miss and therefore they do:

print(dump(data))

instead of

dump(data, sys.stdout)

The former is might be fine for non-realistic data used in the (PyYAML) documentation, but it leads to bad habits for real data.

The best solution is to make your my_logging_func() stream oriented. This can e.g. be done as follows:

import sys
import ruamel.yaml

data = dict(user='rsaw', question=47614862)

class MyLogger:
    def write(self, s):
        sys.stdout.write(s.decode('utf-8'))

my_logging_func = MyLogger()
yml = ruamel.yaml.YAML()
yml.dump(data, my_logging_func)

which gives:

user: rsaw
question: 47614862

but note that MyLogger.write() gets called multiple times (in this case eight times), and if you need to work on a line at a time, you have to do line buffering.

If you really need to process your YAML as bytes or str, you can install the appropriate plugin (ruamel.yaml.bytes resp. ruamel.yaml.string ) and do:

yaml = ruamel.yaml.YAML(typ=['rt', 'string'])
data  = dict(abc=42, help=['on', 'its', 'way'])
print('retval', yaml.dump_to_string(data))

Or process the result of yaml.dump_to_string(data), its equivalent yaml.dumps(data)as you see necessary. Replacingstringwithbytesin the above doesn't decode the UTF-8 stream back tostr` but keeps it as bytes.