I'm using Python version: 2.7.3.
In Python, we use the magic methods __str__
and __unicode__
to define the behavior of str
and unicode
on our custom classes:
>>> class A(object):
def __str__(self):
print 'Casting A to str'
return u'String'
def __unicode__(self):
print 'Casting A to unicode'
return 'Unicode'
>>> a = A()
>>> str(a)
Casting A to str
'String'
>>> unicode(a)
Casting A to unicode
u'Unicode'
The behavior suggests that the return value from __str__
and __unicode__
is coerced to either str
or unicode
depending on which magic method is run.
However, if we do this:
>>> class B(object):
def __str__(self):
print 'Casting B to str'
return A()
def __unicode__(self):
print 'Casting B to unicode'
return A()
>>> b = B()
>>> str(b)
Casting B to str
Traceback (most recent call last):
File "<pyshell#47>", line 1, in <module>
str(b)
TypeError: __str__ returned non-string (type A)
>>> unicode(b)
Casting B to unicode
Traceback (most recent call last):
File "<pyshell#48>", line 1, in <module>
unicode(b)
TypeError: coercing to Unicode: need string or buffer, A found
Calling str.mro()
and unicode.mro()
says that both are subclasses of basestring
. However, __unicode__
also allows returning of buffer
objects, which directly inherits from object
and doesn't inherit from basestring
.
So, my question is, what actually happens when str
and unicode
are called? What are the return value requirements on __str__
and __unicode__
for use in str
and unicode
?
However,
__unicode__
also allows returning of buffer objects, which directly object and don't inherit from basestring.
This is not correct. unicode()
can convert a string or a buffer. It is a "best attempt" at converting the passed argument to unicode using the default encoding (that's why it says coercing). It will always return a unicode object.
So, my question is, what actually happens when str and unicode are called? What are the return value requirements on
__str__
and__unicode__
for use in str and unicode?
__str__
should return an informal, human-friendly string representation of the object. This is what is called when someone uses str()
on your object, or when your object is part of a print statement.
__unicode__
should always return a unicode
object. If this method is not defined, __str__
is called and then the results are coerced to unicode (by passing them to unicode()
).
In your second example, you are returning invalid objects which is why you are seeing the error messages. Your first example appears to work for __unicode__
because of a side-effect, but it is also not written correctly.
The data model section of the documentation is worth a read for more information and details on these "magic methods".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With