I was writing a setup.py for a Python package using setuptools and wanted to include a non-ASCII character in the long_description field:
#!/usr/bin/env python
from setuptools import setup
setup(...
long_description=u"...", # in real code this value is read from a text file
...)
Unfortunately, passing a unicode object to setup() breaks either of the following two commands with a UnicodeEncodeError
python setup.py --long-description | rst2html python setup.py upload
If I use a raw UTF-8 string for the long_description field, then the following command breaks with a UnicodeDecodeError:
python setup.py register
I generally release software by running 'python setup.py sdist register upload', which means ugly hacks that look into sys.argv and pass the right object type are right out.
In the end I gave up and implemented a different ugly hack:
class UltraMagicString(object):
# Catch-22:
# - if I return Unicode, python setup.py --long-description as well
# as python setup.py upload fail with a UnicodeEncodeError
# - if I return UTF-8 string, python setup.py sdist register
# fails with an UnicodeDecodeError
def __init__(self, value):
self.value = value
def __str__(self):
return self.value
def __unicode__(self):
return self.value.decode('UTF-8')
def __add__(self, other):
return UltraMagicString(self.value + str(other))
def split(self, *args, **kw):
return self.value.split(*args, **kw)
...
setup(...
long_description=UltraMagicString("..."),
...)
Isn't there a better way?
It is apparently a distutils bug that has been fixed in python 2.6: http://mail.python.org/pipermail/distutils-sig/2009-September/013275.html
Tarek suggests to patch post_to_server. The patch should pre-process all values in the "data" argument and turn them into unicode and then call the original method. See http://mail.python.org/pipermail/distutils-sig/2009-September/013277.html
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from setuptools import setup
setup(name="fudz",
description="fudzily",
version="0.1",
long_description=u"bläh bläh".encode("UTF-8"), # in real code this value is read from a text file
py_modules=["fudz"],
author="David Fraser",
author_email="[email protected]",
url="http://en.wikipedia.org/wiki/Fudz",
)
I'm testing with the above code - there is no error from --long-description, only from rst2html; upload seems to work OK (although I cancel actually uploading) and register asks me for my username which I don't have. But the traceback in your comment is helpful - it's the automatic conversion to unicode
in the register
command that causes the problem.
See the illusive setdefaultencoding for more information on this - basically you want the default encoding in Python to be able to convert your encoded string back to unicode, but it's tricky to set this up. In this case I think it's worth the effort:
import sys
reload(sys).setdefaultencoding("UTF-8")
Or even to be correct you can get it from the locale
- there's code commented out in /usr/lib/python2.6/site.py
that you can find that does this but I'll leave that discussion for now.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With