Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I add encoding='utf-8' to my Python logging handler?

Suppose I'd like to handle Unicode strings when logging using Python 2.7. It seems "right" to add the encoding parameter to the FileHandler.

# coding=utf-8
import logging

logger = logging.getLogger()
logger.addHandler(logging.FileHandler('my_log.txt', encoding='utf-8'))

logger.error(u'Pão')
logger.error('São')

This has a couple of problems, though:

  1. It raises a UnicodeDecodeError on the UTF-8 string literal 'São'.
  2. The output file has LF line endings on Windows, when CRLF seems more appropriate.

If I don't pass any encoding at all, however, I have neither of those problems. Both strings are logged to a UTF-8 file and I get CRLF line endings. (I think the line ending issue has to do with the file opening in binary mode when an encoding is specified.)

Since omitting the encoding seems to work better, is there some reason I'm missing that I would ever pass in encoding='utf-8'?

like image 230
Eric Smith Avatar asked Feb 05 '14 19:02

Eric Smith


1 Answers

If you pass an encoding to FileHandler, it uses codecs.open() with that encoding to open the file; otherwise, it uses plain open(). That's all the encoding is used for.

Remember, Python 2.x isn't ideal at handling bytes and Unicode correctly: there's implicit encoding and decoding that happens at various times, which can trip you up. You really shouldn't be passing a string like 'São' as bytes in most cases: if it's text, you should be working with Unicode objects.

As for the line endings - this is normally translated to the platform-specific line endings by Python's I/O machinery for files. But if codecs.open() is used, then the underlying file is opened in binary mode, so no translation of \n to \r\n occurs, as it normally would on Windows.

like image 197
Vinay Sajip Avatar answered Sep 16 '22 15:09

Vinay Sajip