Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

properly logging unicode & utf-8 exceptions in python 2

I'm trying to log various exceptions from libraries in python 2.7. I find that sometimes the exceptions contain a unicode string and sometimes a utf8 bytestring. I thought that logging.exception(e) was the right approach to log them, but the following doesn't seem to work:

# encoding: utf-8
import logging
try:
    raise Exception('jörn')
except Exception as e:
    logging.exception(e)

try:
    raise Exception(u'jörn')
except Exception as e:
    logging.exception(e)

saving this into a file and running it results in this:

$ python test.py
ERROR:root:jörn
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    raise Exception('jörn')
Exception: jörn
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.10/Frameworks/Python.framework/Versions/2.7/lib/python2.7/logging/__init__.py", line 859, in emit
    msg = self.format(record)
  File "/usr/local/Cellar/python/2.7.10/Frameworks/Python.framework/Versions/2.7/lib/python2.7/logging/__init__.py", line 732, in format
    return fmt.format(record)
  File "/usr/local/Cellar/python/2.7.10/Frameworks/Python.framework/Versions/2.7/lib/python2.7/logging/__init__.py", line 474, in format
    s = self._fmt % record.__dict__
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 1: ordinal not in range(128)
Logged from file test.py, line 12

So as you see the utf8 exception worked fine, but the unicode exception broke logging, swallowing the real exception and hiding it behind a UnicodeEncodeError.

Is there some standard logging facility for exceptions that won't break my code? What am i missing?

like image 325
Jörn Hees Avatar asked Jun 30 '15 11:06

Jörn Hees


2 Answers

Actually, i think i finally found the mistake and a proper way myself: I seem to have used logging.exception('msg') wrong the whole time. You're not meant to pass in the exception, but a message:

# encoding: utf-8
import logging
try:
    raise Exception('jörn')
except Exception as e:
    logging.exception('exception occurred')

try:
    raise Exception(u'jörn')
except Exception as e:
    logging.exception('exception occurred')

running the above correctly logs the exception:

$ python test.py
ERROR:root:exception occurred
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    raise Exception('jörn')
Exception: jörn
ERROR:root:exception occurred
Traceback (most recent call last):
  File "test.py", line 10, in <module>
    raise Exception(u'jörn')
Exception: j\xf6rn

The reason why logging.exception(e) seems to fail is that it passes the exception e up into logging.Formatter.format() where the it arrives as record.message variable which still is an Exception object.

Then in Line 474 the following happens:

s = self._fmt % record.__dict__

which is equivalent to the following:

s = '%(levelname)s:%(name)s:%(message)s' % {
   'levelname': 'ERROR',
   'name': 'ROOT',
   'message': Exception(u'jörn')
}

It turns out this is why if message is one of ['jörn', u'jörn', Exception('jörn')] it works and not if it is Exception(u'jörn'):

>>> 'foo %s' % 'jörn'
'foo j\xc3\xb6rn'
>>> 'foo %s' % u'jörn'
u'foo j\xf6rn'
>>> 'foo %s' % Exception('jörn')
'foo j\xc3\xb6rn'
>>> 'foo %s' % Exception(u'jörn')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 1: ordinal not in range(128)

As you can see there is an automatic upcasting that happens for unicode strings which is why the following works:

>>> logging.error('jörn')
ERROR:root:jörn
>>> logging.error(u'jörn')
ERROR:root:jörn

This transformation into unicode fails when attempting it with an Exception object that doesn't properly handle the encoding of its message (which sadly seems to be the case in a lot of libraries).

The logging.exception(msg) call seems to properly use the repr() for formatting the exception for logging and prefixes it with your msg. So if you don't make the mistake and pass the exception to logging.exception it will correctly log it.

So long story short:

Don't use logging.exception(e) but logging.exception('exception occurred'). It will automagically and correctly append the formatted exception to your log. If you really want to use the exception's message without assuming some encoding, the safest you can do is logging.exception(repr(e)).

like image 139
Jörn Hees Avatar answered Oct 12 '22 13:10

Jörn Hees


It's not the logging that fails to deal with unicode, it's the Exception.__str__ method which does not support unicode strings as exception arguments. When you invoke logging.exception(e) it will do something like logging.exception(str(e)) which in turn does something like str(self.args) on the exception instance. That's where the error comes from, your self.args is a unicode string which cannot be encoded in ascii. You have two options, either do logging.exception(unicode(e)) or implement your own exception class that provides a __str__ method that can deal with unicode objects in self.args.

The reason why your first test passes is that the editor encodes the string into UTF-8 and Python sees a string instance with encoded unicode characters.

like image 38
Bernhard Avatar answered Oct 12 '22 11:10

Bernhard