I was given to understand that calling print obj
would call obj.__str__()
which would in turn return a string to print to the console. Now I head a problem with Unicode where I could not print any non-ascii characters. I got the typical "ascii out of range" stuff.
While experimenting the following worked:
print obj.__str__()
print obj.__repr__()
With both functions doing exactly the same (__str__()
just returns self.__repr__()
). What did not work:
print obj
The problem occured only with using a character out of ascii range. The final solution was to to the following in __str__()
:
return self.__repr__().encode(sys.stdout.encoding)
Now it works for all parts. My question now is: Where is the difference? Why does it work now? I get if nothing worked, why this works now. But why does only the top part work, not the bottom.
OS is Windows 7 x64 with a default Windows command prompt. Also the encoding is reported to be cp850
. This is more of a general question to understand python. My problem is already solved, but I am not 100% happy, mostly because now calling str(obj)
will yield a string that is not encoded in the way I wanted it.
# -*- coding: utf-8 -*-
class Sample(object):
def __init__(self):
self.name = u"üé"
def __repr__(self):
return self.name
def __str__(self):
return self.name
obj = Sample()
print obj.__str__(), obj.__repr__(), obj
Remove the last obj
and it works. Keep it and it crashes with
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
My guess is that print does something like the following for an object obj
it's meant to print:
obj
is a unicode
. If so, encodes it to sys.stdout.encoding
and prints.obj
is a str
. If so, prints it directly.obj
is anything else, calls str(obj)
and prints that.Step 1. is why print obj.__str__()
works in your case.
Now, what str(obj)
does is:
obj.__str__()
.str
, return itunicode
, encodes it to "ascii"
and return thatCalling obj.__str__()
directly skips steps 2-3, which is why you don't get the encoding failure.
The problem isn't caused by how print
works, it's caused by how str()
works. str()
ignores sys.stdout.encoding
. Since it doesn't know what you want to do with the resulting string, the default encoding it uses can be considered arbitrary; ascii
is as good or bad a choice as any.
To prevent this bug, make sure you return a str
from __str__()
as the documentation tells you to do. A pattern you could use for Python 2.x might be:
class Foo():
def __unicode__(self):
return u'whatever'
def __str__(self):
return unicode(self).encode(sys.stdout.encoding)
(If you're sure you don't need the str()
representation for anything but printing to the console.)
First, if you look at the online documentation, __str__
and __repr__
have different purposes and should create different outputs. So calling __repr__
from __str__
is not the best solution.
Second, print
will call __str__
and will not expect to receive non-ascii characters, because, well, print
cannot guess how to convert the non-ascii character.
Finally, in recent versions of Python 2.x, __unicode__
is the preferred method of creating a string representation for an object. There is an interesting explanation in Python str versus unicode.
So, to try and really answer the question, you could do something like:
class Sample(object):
def __init__(self):
self.name = u"\xfc\xe9"
# No need to implement __repr__. Let Python create the object repr for you
def __str__(self):
return unicode(self).encode('utf-8')
def __unicode__(self):
return self.name
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With