Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

unicode class in Python

Tags:

python

unicode

help(unicode) prints something like:

class unicode(basestring)
 |  unicode(string [, encoding[, errors]]) -> object
...

but you can use something different from a basestring as argument, you can do unicode(1) and get u'1'. What happens in that call? int don't have a __unicode__ method to be called.

like image 343
Juanjo Conti Avatar asked Apr 10 '26 14:04

Juanjo Conti


2 Answers

If __unicode__ exists it is called, otherwise it falls back to __str__

class A(int):
    def __str__(self):
        print "A.str"
        return int.__str__(self)

    def __unicode__(self):
        print "A.unicode"
        return int.__str__(self)

class B(int):
    def __str__(self):
        print "B.str"
        return int.__str__(self)


unicode(A(1)) # prints "A.unicode"
unicode(B(1)) # prints "B.str"
like image 182
John La Rooy Avatar answered Apr 12 '26 03:04

John La Rooy


Same as unicode(str(1)).

>>> class thing(object):
...     def __str__(self):
...         print "__str__ called on " + repr(self)
...         return repr(self)
...
>>> a = thing()
>>> a
<__main__.thing object at 0x7f2f972795d0>
>>> unicode(a)
__str__ called on <__main__.thing object at 0x7f2f972795d0>
u'<__main__.thing object at 0x7f2f972795d0>'

If you really want to see the gritty bits underneath, open up the Python interpreter source code.

Objects/unicodeobject.c#PyUnicode_Type defines the unicode type, with constructor .tp_new=unicode_new.

Since the optional arguments encoding or errors are not given, and a unicode object is being constructed (as opposed to a unicode subclass), Objects/unicodeobject.c#unicode_new calls PyObject_Unicode.

Objects/object.c#PyObject_Unicode calls the __unicode__ method if it exists. If not, it falls back to PY_Type(v)->tp_str (a.k.a. __str__) or PY_Type(v)->tp_repr (a.k.a. __repr__). It then passes the result to PyUnicode_FromEncodedObject.

Objects/unicodeobject.c#PyUnicode_FromEncodedObject finds that it was given a string, and passes it on to PyUnicode_Decode, which returns a unicode object.

Finally, PyObject_Unicode returns to unicode_new, which returns this unicode object.

In short, unicode() will automatically stringify your object if it needs to. This is Python working as expected.

like image 43
ephemient Avatar answered Apr 12 '26 05:04

ephemient



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!