Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to force PyYAML to load strings as unicode objects?

The PyYAML package loads unmarked strings as either unicode or str objects, depending on their content.

I would like to use unicode objects throughout my program (and, unfortunately, can't switch to Python 3 just yet).

Is there an easy way to force PyYAML to always strings load unicode objects? I do not want to clutter my YAML with !!python/unicode tags.

# Encoding: UTF-8

import yaml

menu= u"""---
- spam
- eggs
- bacon
- crème brûlée
- spam
"""

print yaml.load(menu)

Output: ['spam', 'eggs', 'bacon', u'cr\xe8me br\xfbl\xe9e', 'spam']

I would like: [u'spam', u'eggs', u'bacon', u'cr\xe8me br\xfbl\xe9e', u'spam']

like image 797
Petr Viktorin Avatar asked May 22 '10 23:05

Petr Viktorin


1 Answers

Here's a function you could use to use to replace str with unicode types from the decoded output of PyYAML:

def make_str_unicode(obj):
    t = type(obj)

    if t in (list, tuple):
        if t == tuple:
            # Convert to a list if a tuple to 
            # allow assigning to when copying
            is_tuple = True
            obj = list(obj)
        else: 
            # Otherwise just do a quick slice copy
            obj = obj[:]
            is_tuple = False

        # Copy each item recursively
        for x in xrange(len(obj)):
            obj[x] = make_str_unicode(obj[x])

        if is_tuple: 
            # Convert back into a tuple again
            obj = tuple(obj)

    elif t == dict: 
        for k in obj:
            if type(k) == str:
                # Make dict keys unicode
                k = unicode(k)
            obj[k] = make_str_unicode(obj[k])

    elif t == str:
        # Convert strings to unicode objects
        obj = unicode(obj)
    return obj

print make_str_unicode({'blah': ['the', 'quick', u'brown', 124]})
like image 130
cryo Avatar answered Sep 18 '22 05:09

cryo