Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using StringIO for ConfigObj and Unicode

I am trying to use StringIO to feed ConfigObj. I would like to do this in my unit tests, so that I can mock config "files", on the fly, depending on what I want to test in the configuration objects.

I have a whole bunch of things that I am taking care of in the configuration module (I am reading several conf file, aggregating and "formatting" information for the rest of the apps). However, in the tests, I am facing a unicode error from hell. I think I have pinned down my problem to the minimal functionning code, that I have extracted and over-simplified for the purpose of this question.

I am doing the following:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import configobj
import io

def main():
    """Main stuff"""

    input_config = """
    [Header]
    author = PloucPlouc
    description = Test config

    [Study]
    name_of_study = Testing
    version = 9999
    """

    # Just not to trust my default encoding
    input_config = unicode(input_config, "utf-8")

    test_config_fileio = io.StringIO(input_config)    
    print configobj.ConfigObj(infile=test_config_fileio, encoding="UTF8")

if __name__ == "__main__":
    main()

It produces the following traceback:

Traceback (most recent call last):
File "test_configobj.py", line 101, in <module>
    main()
File "test_configobj.py", line 98, in main
    print configobj.ConfigObj(infile=test_config_fileio, encoding='UTF8')
File "/work/irlin168_1/USER/Apps/python272/lib/python2.7/site-packages/configobj-4.7.2-py2.7.egg/configobj.py", line 1242, in __init__
    self._load(infile, configspec)
File "/work/irlin168_1/USER/Apps/python272/lib/python2.7/site-packages/configobj-4.7.2-py2.7.egg/configobj.py", line 1302, in _load
    infile = self._handle_bom(infile)
File "/work/irlin168_1/USER/Apps/python272/lib/python2.7/site-packages/configobj-4.7.2-py2.7.egg/configobj.py", line 1442, in _handle_bom
    if not line.startswith(BOM):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)

I am using Python-2.7.2 (32 bits) on linux. My locale for the console and for the editor (Kile) are set to fr_FR.utf8.

I thought I could do this.

From the io.StringIO documentation, I got this:

The StringIO object can accept either Unicode or 8-bit strings, but mixing the two may take some care.

And from ConfigObj documentation, I can do this:

>>> config = ConfigObj('config.ini', encoding='UTF8')
>>> config['name']
    u'Michael Foord'

and this:

infile: None

You don't need to specify an infile. If you omit it, an empty ConfigObj will be created. infile can be :

   [...]
   A StringIO instance or file object, or any object with a read method. The filename attribute of your ConfigObj will be None [5].

'encoding': None

By default ConfigObj does not decode the file/strings you pass it into Unicode [8]. If you want your config file as Unicode (keys and members) you need to provide an encoding to decode the file with. This encoding will also be used to encode the config file when writing.

My question is why does it produce this? What else did I not understand from (simple) Unicode handling?...

By looking at this answer, I changed:

input_config = unicode(input_config, "utf8")

to (importing codecs module breforehand):

input_config = unicode(input_config, "utf8").strip(codecs.BOM_UTF8.decode("utf8", "strict"))

in order to get rid of possible included byte order mark, but it did not help.

Thanks a lot

NB: I have the same traceback if I use StringIO.StringIO instead of io.StringIO.

like image 791
Marc-Olivier Titeux Avatar asked Oct 08 '22 00:10

Marc-Olivier Titeux


1 Answers

This line:

input_config = unicode(input_config, "utf8")

is converting your input to Unicode, but this line:

print configobj.ConfigObj(infile=test_config_fileio, encoding="UTF8")

is declaring the input to be a UTF-8-encoded byte string. The error indicates a Unicode string was passed when a byte string was expected, so commenting out the first line above should resolve the issue. I don't have configobj at the moment so can't test it.

like image 196
Mark Tolonen Avatar answered Oct 13 '22 11:10

Mark Tolonen