Suppose I'd like to handle Unicode strings when logging using Python 2.7. It seems "right" to add the encoding parameter to the FileHandler.
# coding=utf-8
import logging
logger = logging.getLogger()
logger.addHandler(logging.FileHandler('my_log.txt', encoding='utf-8'))
logger.error(u'Pão')
logger.error('São')
This has a couple of problems, though:
If I don't pass any encoding at all, however, I have neither of those problems. Both strings are logged to a UTF-8 file and I get CRLF line endings. (I think the line ending issue has to do with the file opening in binary mode when an encoding is specified.)
Since omitting the encoding seems to work better, is there some reason I'm missing that I would ever pass in encoding='utf-8'
?
If you pass an encoding to FileHandler
, it uses codecs.open()
with that encoding to open the file; otherwise, it uses plain open()
. That's all the encoding
is used for.
Remember, Python 2.x isn't ideal at handling bytes and Unicode correctly: there's implicit encoding and decoding that happens at various times, which can trip you up. You really shouldn't be passing a string like 'São' as bytes in most cases: if it's text, you should be working with Unicode objects.
As for the line endings - this is normally translated to the platform-specific line endings by Python's I/O machinery for files. But if codecs.open()
is used, then the underlying file is opened in binary mode, so no translation of \n
to \r\n
occurs, as it normally would on Windows.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With