Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python difference between print obj and print obj.__str__() [at least with Unicode?]

Tags:

python

unicode

I was given to understand that calling print obj would call obj.__str__() which would in turn return a string to print to the console. Now I head a problem with Unicode where I could not print any non-ascii characters. I got the typical "ascii out of range" stuff.

While experimenting the following worked:

print obj.__str__()
print obj.__repr__()

With both functions doing exactly the same (__str__() just returns self.__repr__()). What did not work:

print obj

The problem occured only with using a character out of ascii range. The final solution was to to the following in __str__():

return self.__repr__().encode(sys.stdout.encoding)

Now it works for all parts. My question now is: Where is the difference? Why does it work now? I get if nothing worked, why this works now. But why does only the top part work, not the bottom.

OS is Windows 7 x64 with a default Windows command prompt. Also the encoding is reported to be cp850. This is more of a general question to understand python. My problem is already solved, but I am not 100% happy, mostly because now calling str(obj) will yield a string that is not encoded in the way I wanted it.

# -*- coding: utf-8 -*- 
class Sample(object):

    def __init__(self):
        self.name = u"üé"

    def __repr__(self):
        return self.name

    def __str__(self):
        return self.name

obj = Sample()
print obj.__str__(), obj.__repr__(), obj

Remove the last obj and it works. Keep it and it crashes with

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
like image 767
javex Avatar asked Jul 03 '12 19:07

javex


2 Answers

My guess is that print does something like the following for an object obj it's meant to print:

  1. Checks if the obj is a unicode. If so, encodes it to sys.stdout.encoding and prints.
  2. Checks if the obj is a str. If so, prints it directly.
  3. If obj is anything else, calls str(obj) and prints that.

Step 1. is why print obj.__str__() works in your case.

Now, what str(obj) does is:

  1. Call obj.__str__().
  2. If the result is a str, return it
  3. If the result is a unicode, encodes it to "ascii" and return that
  4. Otherwise, something mostly useless.

Calling obj.__str__() directly skips steps 2-3, which is why you don't get the encoding failure.

The problem isn't caused by how print works, it's caused by how str() works. str() ignores sys.stdout.encoding. Since it doesn't know what you want to do with the resulting string, the default encoding it uses can be considered arbitrary; ascii is as good or bad a choice as any.

To prevent this bug, make sure you return a str from __str__() as the documentation tells you to do. A pattern you could use for Python 2.x might be:

class Foo():
    def __unicode__(self):
        return u'whatever'
    def __str__(self):
        return unicode(self).encode(sys.stdout.encoding)

(If you're sure you don't need the str() representation for anything but printing to the console.)

like image 178
millimoose Avatar answered Oct 21 '22 13:10

millimoose


First, if you look at the online documentation, __str__ and __repr__ have different purposes and should create different outputs. So calling __repr__ from __str__ is not the best solution.

Second, print will call __str__ and will not expect to receive non-ascii characters, because, well, print cannot guess how to convert the non-ascii character.

Finally, in recent versions of Python 2.x, __unicode__ is the preferred method of creating a string representation for an object. There is an interesting explanation in Python str versus unicode.

So, to try and really answer the question, you could do something like:

class Sample(object):

    def __init__(self):
        self.name = u"\xfc\xe9"

    # No need to implement __repr__. Let Python create the object repr for you

    def __str__(self):
        return unicode(self).encode('utf-8')

    def __unicode__(self):
        return self.name
like image 33
Rodrigue Avatar answered Oct 21 '22 12:10

Rodrigue