The PyYAML package loads unmarked strings as either unicode or str objects, depending on their content.
I would like to use unicode objects throughout my program (and, unfortunately, can't switch to Python 3 just yet).
Is there an easy way to force PyYAML to always strings load unicode objects? I do not want to clutter my YAML with !!python/unicode
tags.
# Encoding: UTF-8
import yaml
menu= u"""---
- spam
- eggs
- bacon
- crème brûlée
- spam
"""
print yaml.load(menu)
Output: ['spam', 'eggs', 'bacon', u'cr\xe8me br\xfbl\xe9e', 'spam']
I would like: [u'spam', u'eggs', u'bacon', u'cr\xe8me br\xfbl\xe9e', u'spam']
Here's a function you could use to use to replace str
with unicode
types from the decoded output of PyYAML
:
def make_str_unicode(obj):
t = type(obj)
if t in (list, tuple):
if t == tuple:
# Convert to a list if a tuple to
# allow assigning to when copying
is_tuple = True
obj = list(obj)
else:
# Otherwise just do a quick slice copy
obj = obj[:]
is_tuple = False
# Copy each item recursively
for x in xrange(len(obj)):
obj[x] = make_str_unicode(obj[x])
if is_tuple:
# Convert back into a tuple again
obj = tuple(obj)
elif t == dict:
for k in obj:
if type(k) == str:
# Make dict keys unicode
k = unicode(k)
obj[k] = make_str_unicode(obj[k])
elif t == str:
# Convert strings to unicode objects
obj = unicode(obj)
return obj
print make_str_unicode({'blah': ['the', 'quick', u'brown', 124]})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With