In the past, I did something like some_fancy_printing_loggin_func(yaml.dump(...), ...)
, using the backward-compatible part of ruamel.yaml, but I want to convert my code to use the latest API so that I can take advantage of some of the new formatting settings.
However, I hate that I have to specify a stream to ruamel.yaml.YAML.dump()
... I don't want it to write directly to a stream; I just want it to return the output to the caller. What am I missing?
PS: I know I can do something like the following, though of course I'm trying to avoid it.
f = io.StringIO()
yml.dump(myobj, f)
f.seek(0)
my_logging_func(f.read())
yaml. dump(data) produces the document as a UTF-8 encoded str object. yaml. dump(data, encoding=('utf-8'|'utf-16-be'|'utf-16-le')) produces a str object in the specified encoding.
Open the empty Python file within the text editor and start to code within it. We add the python path within this code in the first line. The code is initiated with the simple import of the “yaml” repository to use the “yaml” related functions within the code, i.e. “dump()” function.
Type “ pip install ruamel-yaml ” (without quotes) in the command line and hit Enter again. This installs ruamel-yaml for your default Python installation.
This answer (a small wrapper around ruamel.yaml
), was put into a pip module here by me after needing this functionality so frequently
pip install ez_yaml
import ez_yaml
ez_yaml.to_string(obj=your_object , options={})
ez_yaml.to_object(file_path=your_path, options={})
ez_yaml.to_object(string=your_string , options={})
ez_yaml.to_file(your_object, file_path=your_path)
def object_to_yaml_str(obj, options=None):
#
# setup yaml part (customize this, probably move it outside this def)
#
import ruamel.yaml
yaml = ruamel.yaml.YAML()
yaml.version = (1, 2)
yaml.indent(mapping=3, sequence=2, offset=0)
yaml.allow_duplicate_keys = True
# show null
def my_represent_none(self, data):
return self.represent_scalar(u'tag:yaml.org,2002:null', u'null')
yaml.representer.add_representer(type(None), my_represent_none)
#
# the to-string part
#
if options == None: options = {}
from io import StringIO
string_stream = StringIO()
yaml.dump(obj, string_stream, **options)
output_str = string_stream.getvalue()
string_stream.close()
return output_str
import ruamel.yaml
from io import StringIO
from pathlib import Path
# setup loader (basically options)
yaml = ruamel.yaml.YAML()
yaml.version = (1, 2)
yaml.indent(mapping=3, sequence=2, offset=0)
yaml.allow_duplicate_keys = True
yaml.explicit_start = False
# show null
def my_represent_none(self, data):
return self.represent_scalar(u'tag:yaml.org,2002:null', u'null')
yaml.representer.add_representer(type(None), my_represent_none)
# o->s
def object_to_yaml_str(obj, options=None):
if options == None: options = {}
string_stream = StringIO()
yaml.dump(obj, string_stream, **options)
output_str = string_stream.getvalue()
string_stream.close()
return output_str
# s->o
def yaml_string_to_object(string, options=None):
if options == None: options = {}
return yaml.load(string, **options)
# f->o
def yaml_file_to_object(file_path, options=None):
if options == None: options = {}
as_path_object = Path(file_path)
return yaml.load(as_path_object, **options)
# o->f
def object_to_yaml_file(obj, file_path, options=None):
if options == None: options = {}
as_path_object = Path(Path(file_path))
with as_path_object.open('w') as output_file:
return yaml.dump(obj, output_file, **options)
#
# string examples
#
yaml_string = object_to_yaml_str({ (1,2): "hi" })
print("yaml string:", yaml_string)
obj = yaml_string_to_object(yaml_string)
print("obj from string:", obj)
#
# file examples
#
obj = yaml_file_to_object("./thingy.yaml")
print("obj from file:", obj)
object_to_yaml_file(obj, file_path="./thingy2.yaml")
print("saved that to a file")
I appreciate Mike Night solving the original "I just want it to return the output to the caller", and calling out that Anthon's post fails to answer the question. Which I will do further: Anthon your module is great; round trip is impressive and one of the few ones ever made. But, (this happens often on Stack Overflow) it is not the job of the author to make other people's code runtime-efficient. Explicit tradeoffs are great, an author should help people understand the consequences of their choices. Adding a warning, including "slow" in the name, etc can be very helpful. However, the methods in the ruamel.yaml documentation; creating an entire inherited class, are not "explicit". They are encumbering and obfuscating, making it difficult to perform and time consuming for others to understand what and why that additional code exists.
Many users rightfully do not care about runtime performance. The runtime of my program, without YAML, is 2 weeks. A 500,000 line yaml file is read in seconds. Both the 2 weeks and the few seconds are irrelevant to the project because it they are CPU time and the project is billed purely by man-hours.
The YAML code was already a string object because of other other operations being performed on it. Forcing it into a stream is is actually causing more overhead. Removing the need for the string form of the YAML would involve rewriting several major libraries and potentially months of effort; making streams a highly impractical choice in this situation.
Assuming keeping it as a stream is even possible, and that the project was billed by CPU time instead of man-hours; optimizing a 500,000-line-yaml-file-as-string would be a ≤0.0001% increase in efficiency. The extra hour spent figuring out answer to this question, and the time spent by others understanding the work-around, could have instead been spent on improving the efficiency of one of the c-functions that is being called 100 times a second. So even when we do care about CPU time, the particular method still fails to be a useful choice.
A post ignoring the question while also suggesting users sink potentially large amounts of time rewriting their applications is not an answer. Respect others by assuming they generally know what they are doing and are aware of the alternatives. Then offers of potentially more-efficient methods will be met with appreciation rather than rejection.
[end rant]
I am not sure if you really are missing something, if at all it might be that if you're working with streams you should—preferably—continue to work with streams. That is however something many users of ruamel.yaml and PyYAML seem to miss and therefore they do:
print(dump(data))
instead of
dump(data, sys.stdout)
The former is might be fine for non-realistic data used in the (PyYAML) documentation, but it leads to bad habits for real data.
The best solution is to make your my_logging_func()
stream oriented. This can e.g. be done as follows:
import sys
import ruamel.yaml
data = dict(user='rsaw', question=47614862)
class MyLogger:
def write(self, s):
sys.stdout.write(s.decode('utf-8'))
my_logging_func = MyLogger()
yml = ruamel.yaml.YAML()
yml.dump(data, my_logging_func)
which gives:
user: rsaw
question: 47614862
but note that MyLogger.write()
gets called multiple times (in this case eight times), and if you need to work on a line at a time, you have to do line buffering.
If you really need to process your YAML as bytes
or str
, you can install the appropriate plugin (ruamel.yaml.bytes
resp. ruamel.yaml.string
) and do:
yaml = ruamel.yaml.YAML(typ=['rt', 'string'])
data = dict(abc=42, help=['on', 'its', 'way'])
print('retval', yaml.dump_to_string(data))
Or process the result of yaml.dump_to_string(data)
, its equivalent yaml.dumps(data)as you see necessary. Replacing
stringwith
bytesin the above doesn't decode the UTF-8 stream back to
str` but keeps it as bytes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With