Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the right way to use Unicode metadata in setup.py?

I was writing a setup.py for a Python package using setuptools and wanted to include a non-ASCII character in the long_description field:

#!/usr/bin/env python
from setuptools import setup
setup(...
      long_description=u"...", # in real code this value is read from a text file
      ...)

Unfortunately, passing a unicode object to setup() breaks either of the following two commands with a UnicodeEncodeError

python setup.py --long-description | rst2html
python setup.py upload

If I use a raw UTF-8 string for the long_description field, then the following command breaks with a UnicodeDecodeError:

python setup.py register

I generally release software by running 'python setup.py sdist register upload', which means ugly hacks that look into sys.argv and pass the right object type are right out.

In the end I gave up and implemented a different ugly hack:

class UltraMagicString(object):
    # Catch-22:
    # - if I return Unicode, python setup.py --long-description as well
    #   as python setup.py upload fail with a UnicodeEncodeError
    # - if I return UTF-8 string, python setup.py sdist register
    #   fails with an UnicodeDecodeError

    def __init__(self, value):
        self.value = value

    def __str__(self):
        return self.value

    def __unicode__(self):
        return self.value.decode('UTF-8')

    def __add__(self, other):
        return UltraMagicString(self.value + str(other))

    def split(self, *args, **kw):
        return self.value.split(*args, **kw)

...

setup(...
      long_description=UltraMagicString("..."),
      ...)

Isn't there a better way?

like image 593
Marius Gedminas Avatar asked Jul 21 '09 23:07

Marius Gedminas


2 Answers

It is apparently a distutils bug that has been fixed in python 2.6: http://mail.python.org/pipermail/distutils-sig/2009-September/013275.html

Tarek suggests to patch post_to_server. The patch should pre-process all values in the "data" argument and turn them into unicode and then call the original method. See http://mail.python.org/pipermail/distutils-sig/2009-September/013277.html

like image 135
Reinout van Rees Avatar answered Oct 03 '22 22:10

Reinout van Rees


#!/usr/bin/env python
# -*- coding: utf-8 -*-

from setuptools import setup
setup(name="fudz",
      description="fudzily",
      version="0.1",
      long_description=u"bläh bläh".encode("UTF-8"), # in real code this value is read from a text file
      py_modules=["fudz"],
      author="David Fraser",
      author_email="[email protected]",
      url="http://en.wikipedia.org/wiki/Fudz",
      )

I'm testing with the above code - there is no error from --long-description, only from rst2html; upload seems to work OK (although I cancel actually uploading) and register asks me for my username which I don't have. But the traceback in your comment is helpful - it's the automatic conversion to unicode in the register command that causes the problem.

See the illusive setdefaultencoding for more information on this - basically you want the default encoding in Python to be able to convert your encoded string back to unicode, but it's tricky to set this up. In this case I think it's worth the effort:

import sys
reload(sys).setdefaultencoding("UTF-8")

Or even to be correct you can get it from the locale - there's code commented out in /usr/lib/python2.6/site.py that you can find that does this but I'll leave that discussion for now.

like image 29
David Fraser Avatar answered Oct 03 '22 22:10

David Fraser