Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to use ruamel.yaml to dump YAML to string (NOT to stream)

In the past, I did something like some_fancy_printing_loggin_func(yaml.dump(...), ...), using the backward-compatible part of ruamel.yaml, but I want to convert my code to use the latest API so that I can take advantage of some of the new formatting settings.

However, I hate that I have to specify a stream to ruamel.yaml.YAML.dump() ... I don't want it to write directly to a stream; I just want it to return the output to the caller. What am I missing?

PS: I know I can do something like the following, though of course I'm trying to avoid it.

f = io.StringIO()
yml.dump(myobj, f)
f.seek(0)
my_logging_func(f.read())
like image 375
rsaw Avatar asked Dec 03 '17 03:12

rsaw


People also ask

What does Yaml dump do?

yaml. dump(data) produces the document as a UTF-8 encoded str object. yaml. dump(data, encoding=('utf-8'|'utf-16-be'|'utf-16-le')) produces a str object in the specified encoding.

How do I dump a Yaml file in Python?

Open the empty Python file within the text editor and start to code within it. We add the python path within this code in the first line. The code is initiated with the simple import of the “yaml” repository to use the “yaml” related functions within the code, i.e. “dump()” function.

How do I import Ruamel Yaml?

Type “ pip install ruamel-yaml ” (without quotes) in the command line and hit Enter again. This installs ruamel-yaml for your default Python installation.


2 Answers

This answer (a small wrapper around ruamel.yaml), was put into a pip module here by me after needing this functionality so frequently

TLDR

pip install ez_yaml

import ez_yaml

ez_yaml.to_string(obj=your_object    , options={})

ez_yaml.to_object(file_path=your_path, options={})
ez_yaml.to_object(string=your_string , options={})

ez_yaml.to_file(your_object, file_path=your_path)

Hacky / Copy-Paste Solution to Original Question

def object_to_yaml_str(obj, options=None):
    # 
    # setup yaml part (customize this, probably move it outside this def)
    # 
    import ruamel.yaml
    yaml = ruamel.yaml.YAML()
    yaml.version = (1, 2)
    yaml.indent(mapping=3, sequence=2, offset=0)
    yaml.allow_duplicate_keys = True
    # show null
    def my_represent_none(self, data):
        return self.represent_scalar(u'tag:yaml.org,2002:null', u'null')
    yaml.representer.add_representer(type(None), my_represent_none)
    
    # 
    # the to-string part
    # 
    if options == None: options = {}
    from io import StringIO
    string_stream = StringIO()
    yaml.dump(obj, string_stream, **options)
    output_str = string_stream.getvalue()
    string_stream.close()
    return output_str

Original Answer (if you want to customize the config/options more)

import ruamel.yaml
from io import StringIO
from pathlib import Path

# setup loader (basically options)
yaml = ruamel.yaml.YAML()
yaml.version = (1, 2)
yaml.indent(mapping=3, sequence=2, offset=0)
yaml.allow_duplicate_keys = True
yaml.explicit_start = False
# show null
def my_represent_none(self, data):
    return self.represent_scalar(u'tag:yaml.org,2002:null', u'null')
yaml.representer.add_representer(type(None), my_represent_none)

# o->s
def object_to_yaml_str(obj, options=None):
    if options == None: options = {}
    string_stream = StringIO()
    yaml.dump(obj, string_stream, **options)
    output_str = string_stream.getvalue()
    string_stream.close()
    return output_str

# s->o
def yaml_string_to_object(string, options=None):
    if options == None: options = {}
    return yaml.load(string, **options)

# f->o
def yaml_file_to_object(file_path, options=None):
    if options == None: options = {}
    as_path_object = Path(file_path)
    return yaml.load(as_path_object, **options)

# o->f
def object_to_yaml_file(obj, file_path, options=None):
    if options == None: options = {}
    as_path_object = Path(Path(file_path))
    with as_path_object.open('w') as output_file:
        return yaml.dump(obj, output_file, **options)

# 
# string examples
# 
yaml_string = object_to_yaml_str({ (1,2): "hi" })
print("yaml string:", yaml_string)
obj = yaml_string_to_object(yaml_string)
print("obj from string:", obj)

# 
# file examples
# 
obj = yaml_file_to_object("./thingy.yaml")
print("obj from file:", obj)
object_to_yaml_file(obj, file_path="./thingy2.yaml")
print("saved that to a file")

Rant

I appreciate Mike Night solving the original "I just want it to return the output to the caller", and calling out that Anthon's post fails to answer the question. Which I will do further: Anthon your module is great; round trip is impressive and one of the few ones ever made. But, (this happens often on Stack Overflow) it is not the job of the author to make other people's code runtime-efficient. Explicit tradeoffs are great, an author should help people understand the consequences of their choices. Adding a warning, including "slow" in the name, etc can be very helpful. However, the methods in the ruamel.yaml documentation; creating an entire inherited class, are not "explicit". They are encumbering and obfuscating, making it difficult to perform and time consuming for others to understand what and why that additional code exists.

Many users rightfully do not care about runtime performance. The runtime of my program, without YAML, is 2 weeks. A 500,000 line yaml file is read in seconds. Both the 2 weeks and the few seconds are irrelevant to the project because it they are CPU time and the project is billed purely by man-hours.

The YAML code was already a string object because of other other operations being performed on it. Forcing it into a stream is is actually causing more overhead. Removing the need for the string form of the YAML would involve rewriting several major libraries and potentially months of effort; making streams a highly impractical choice in this situation.

Assuming keeping it as a stream is even possible, and that the project was billed by CPU time instead of man-hours; optimizing a 500,000-line-yaml-file-as-string would be a ≤0.0001% increase in efficiency. The extra hour spent figuring out answer to this question, and the time spent by others understanding the work-around, could have instead been spent on improving the efficiency of one of the c-functions that is being called 100 times a second. So even when we do care about CPU time, the particular method still fails to be a useful choice.

A post ignoring the question while also suggesting users sink potentially large amounts of time rewriting their applications is not an answer. Respect others by assuming they generally know what they are doing and are aware of the alternatives. Then offers of potentially more-efficient methods will be met with appreciation rather than rejection.

[end rant]

like image 146
Jeff Hykin Avatar answered Oct 13 '22 12:10

Jeff Hykin


I am not sure if you really are missing something, if at all it might be that if you're working with streams you should—preferably—continue to work with streams. That is however something many users of ruamel.yaml and PyYAML seem to miss and therefore they do:

print(dump(data))

instead of

dump(data, sys.stdout)

The former is might be fine for non-realistic data used in the (PyYAML) documentation, but it leads to bad habits for real data.

The best solution is to make your my_logging_func() stream oriented. This can e.g. be done as follows:

import sys
import ruamel.yaml

data = dict(user='rsaw', question=47614862)

class MyLogger:
    def write(self, s):
        sys.stdout.write(s.decode('utf-8'))

my_logging_func = MyLogger()
yml = ruamel.yaml.YAML()
yml.dump(data, my_logging_func)

which gives:

user: rsaw
question: 47614862

but note that MyLogger.write() gets called multiple times (in this case eight times), and if you need to work on a line at a time, you have to do line buffering.

If you really need to process your YAML as bytes or str, you can install the appropriate plugin (ruamel.yaml.bytes resp. ruamel.yaml.string ) and do:

yaml = ruamel.yaml.YAML(typ=['rt', 'string'])
data  = dict(abc=42, help=['on', 'its', 'way'])
print('retval', yaml.dump_to_string(data))

Or process the result of yaml.dump_to_string(data), its equivalent yaml.dumps(data)as you see necessary. Replacingstringwithbytesin the above doesn't decode the UTF-8 stream back tostr` but keeps it as bytes.

like image 38
Anthon Avatar answered Oct 13 '22 12:10

Anthon