my troubles with ConfigParser continue. It seems it doesn't support Unicode very well. The config file is indeed saved as UTF-8, but when ConfigParser reads it it seems to be encoded into something else. I assumed it was latin-1 and I thougt overriding optionxform
could help:
-- configfile.cfg -- [rules] Häjsan = 3 ☃ = my snowman -- myapp.py -- # -*- coding: utf-8 -*- import ConfigParser def _optionxform(s): try: newstr = s.decode('latin-1') newstr = newstr.encode('utf-8') return newstr except Exception, e: print e cfg = ConfigParser.ConfigParser() cfg.optionxform = _optionxform cfg.read("myconfig")
Of course, when I read the config I get:
'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
I've tried a couple of different variations of decoding 's' but the point seems moot, since it really should be a unicode object from the beginning. After all, the config file is UTF-8? I have confirmed that's something is wrong in the way ConfigParser reads the file by stubbing it out with this DummyConfig class. If I use that then everything is nice unicode, fine and dandy.
-- config.py -- # -*- coding: utf-8 -*- apa = {'rules': [(u'Häjsan', 3), (u'☃', u'my snowman')]} class DummyConfig(object): def sections(self): return apa.keys() def items(self, section): return apa[section] def add_section(self, apa): pass def set(self, *args): pass
Any ideas what could be causing this or suggestions of other config modules that supports Unicode better are most welcome. I don't want to use sys.setdefaultencoding()
!
ConfigParser is a Python class which implements a basic configuration language for Python programs. It provides a structure similar to Microsoft Windows INI files. ConfigParser allows to write Python programs which can be customized by end users easily.
Just use a StringIO object and the configparser's write method. It looks like the only method for "printing" the contents of a config object is ConfigParser. write which takes a file-like object. io.
configparser comes from Python 3 and as such it works well with Unicode.
The ConfigParser.readfp()
method can take a file object, have you tried opening the file object with the correct encoding using the codecs module before sending it to ConfigParser like below:
cfg.readfp(codecs.open("myconfig", "r", "utf8"))
For Python 3.2 or above, readfp()
is deprecated. Use read_file()
instead.
In python 3.2 encoding
parameter was introduced to read()
, so it can now be used as:
cfg.read("myconfig", encoding='utf-8')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With