totally confused by now... I am developing in python/django and using python logging. All of my app requires unicode and all my models have only a unicode()`, return u'..' methods implemented. Now when logging I have come upon a really strange issue that it took a long time to discover that I could reproduce it. I have tried both Py 2.5.5 and Py 2.6.4 and same thing. So
Whenever I do some straight forward logging like:
logging.debug(u'new value %s' % group)
this calls the models group.unicode(): return unicode(group.name)
My unicode methods all looks like this:
def __unicode__(self):
return u'%s - %s (%s)' % (self.group, self.user.get_full_name(), self.role)
This works even if group.name is XXX or ÄÄÄ (requiring unicode). But when I for some reason want to log a set, list, dictionary, django-query set and the individual instances in e.g. the list might be unicode or not I get into trouble...
So this will get me a UnicodeDecodingError whenever a group.name requires unicode like Luleå (my hometown)
logging.debug(u'new groups %s' % list_of_groups)
Typically I get an error like this:
Exception Type: UnicodeDecodeError
Exception Value: ('ascii', '<RBACInstanceRoleSet: s2 | \xc3\x84\xc3\x96\xc3\x96\xc3\x85\xc3\x85\xc3\x85 Gruppen>]', 106, 107, 'ordinal not in range(128)')
But if I do print list_of_groups
everything gets out nice on terminal
So, my understanding is that the list starts to generate itself and does repr() on all its elements and they return their values - in this case it should be 's2 | ÅÄÖÖ', then the list presents itself as (ascii, the-stuff-in-the-list) and then when trying to Decode the ascii into unicode this will of course not work -- since one of the elements in the list has returened a u'...' of itself when repr was done on it.
But why is this????´
And why do things work and unicode/ascii is handled correctly whenever I log simple things like group.name and so or group and the unicode methods are called. Whenever I get lazy and want to log a list, set or other things go bad whenever a unicode character is encountered...
Some more examples that work and fail. If group.name
I go to the model field and group
calls the __unicode__()
logging.debug("1. group: %s " % group.name) # WORKS
logging.debug(u"2. group: %s " % group) # WORKS
logging.debug("3. group: %s " % group) # FAILS
logging.debug(u"4. group: %s " % group.name) # WORKS
logging.debug("5. group: %s " % group.name) # WORKS
...and I really thought I had a grip on Unicode ;-(
Here's my test code:
#-*- coding: utf-8 -*-
class Wrap:
def __init__(self, s): self.s = s
def __repr__(self): return repr(self.s)
def __unicode__(self): return unicode(self.s)
def __str__(self): return str(self.s)
s = 'hello' # a plaintext string
u = 'ÅÄÖÖ'.decode('utf-8')
l = [s,u]
test0 = unicode(repr(l))
test1 = 'string %s' % l
test2 = u'unicode %s' % l
The above works fine when you run it. However, if you change the declaration of repr to: def repr(self): return unicode(self.s)
Then it aborts with:
Traceback (most recent call last):
File "mytest.py", line 13, in <module> unicode(l)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3:
ordinal not in range(128)
So it looks like someone in the object hierarchy has a repr() implementation which is incorrectly returning a unicode string instead of a normal string. As someone else mentioned, when you do a format string like
'format %s' % mylist
and mylist is a sequence, python automatically calls repr() on it rather than unicode() (since there is no "correct" way to represent a list as a unicode string).
It may be django that's at fault here, or maybe you've implemented __repr__
incorrectly in one of your models.
I can't reproduce your problem with a simple test:
Python 2.6.4 (r264:75706, Dec 7 2009, 18:45:15) [GCC 4.4.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import logging >>> group = u'Luleå' >>> logging.warning('Group: %s', group) WARNING:root:Group: Luleå >>> logging.warning(u'Group: %s', group) WARNING:root:Group: Luleå >>>
So, as Daniel says, there is probably something which is not proper Unicode in what you're passing to logging.
Also, I don't know what handlers you're using, but make sure if there are file handlers that you explicitly specify the output encoding to use, and if there are stream handlers you also wrap any output stream which needs it with an encoding wrapper such as is provided by the codecs
module (and pass the wrapped stream to logging).
Try to use this code in the top of your views.py
#-*- coding: utf-8 -*-
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With