Python's string.format() and Unicode

Tags:

I'm having a problem with Python's string.format() and passing Unicode strings to it. This is similar to this older question, except that in my case the test code explodes on the print, not on the logging.info() call. Passing the same Unicode string object to a logging handler works fine.

This fails equally well with the older % formatting as well as string.format(). Just to make sure it was the string object that is the problem, and not print interacting badly with my terminal, I tried assigning the formatted string to a variable before printing.

def unicode_test():
    byte_string = '\xc3\xb4'
    unicode_string = unicode(byte_string, "utf-8")
    print "unicode object type: {}".format(type(unicode_string))
    output_string = "printed unicode object: {}".format(unicode_string)
    print output_string

if __name__ == '__main__':
    unicode_test()

The string object seems to assume it's getting ASCII.

% python -V
Python 2.7.2

% python ./unicodetest.py
unicode object type: <type 'unicode'>
Traceback (most recent call last):
  File "./unicodetest.py", line 10, in <module>
    unicode_test()
  File "./unicodetest.py", line 6, in unicode_test
    output_string = "printed unicode object: {}".format(unicode_string)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf4' in position 0: ordinal not in range(128)

Trying to cast output_string as Unicode doesn't make any difference.

output_string = u"printed unicode object: {}".format(unicode_string)

Am I missing something here? The documentation for the string object seems pretty clear that this should work as I'm attempting to use it.

484

asked Dec 02 '12 22:12

mpounsett

1 Answers

No this should not work (can you cite the part of the documentation that says so ?), but it should work if the formatting pattern is unicode (or with the old formatting which 'promotes' the pattern to unicode instead of trying to 'demote' arguments).

>>> x = "\xc3\xb4".decode('utf-8')
>>> x
u'\xf4'
>>> x + 'a'
u'\xf4a'
>>> 'a' + x
u'a\xf4'
>>> 'a %s' % x
u'a \xf4'
>>> 'a {}'.format(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec 
  can't encode character u'\xf4' in position 0: ordinal not in range(128)
>>> u'a {}'.format(x)
u'a \xf4'
>>> print u"Foo bar {}".format(x)
Foo bar ô

Edit: The print line may not work for you if the unicode string can't be encoded using your console's encoding. For example, on my Windows console:

>>> import sys
>>> sys.stdout.encoding
'cp852'
>>> u'\xf4'.encode('cp852')
'\x93'

On a UNIX console this may related to your locale settings. It will also fail if you redirect output (like when using | in shell). Most of this issues have been fixed in Python 3.

answered Oct 04 '22 22:10

lqc

Related questions
                            
                                SQLAlchemy ER diagram in python 3
                            
                                Validation accuracy is always greater than training accuracy in Keras
                            
                                python replace value in json file [closed]
                            
                                What is the "endpoint" in flask's .add_url_rule()?
                            
                                Pandas groupby apply vs transform with specific functions
                            
                                pipenv install failing due to timeout
                            
                                Plotting grids across the subplots Python matplotlib
                            
                                Why does adding a semicolon in Python change the result? [duplicate]
                            
                                Why does my google colab session keep crashing?
                            
                                Changing color of seaborn plot line
                            
                                Finding python site-packages directory with CMake
                            
                                What variable name do you use for file descriptors?
                            
                                Cannot solve mod_wsgi exception in Django setup
                            
                                How can get Python isidentifer() functionality in Python 2.6?
                            
                                Explicitly declaring a variable type in Python
                            
                                Python and sqlite3 - importing and exporting databases
                            
                                Cross database join in sqlalchemy
                            
                                Assignment statement value
                            
                                Django Admin inline for recursive ManyToMany
                            
                                Python: determine actual current module (not __main__)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python's string.format() and Unicode

Tags:

python

unicode

mpounsett

People also ask

1 Answers

lqc

Recent Activity

Donate For Us