I'm having trouble with encodings. I'm using version
Python 2.7.2+ (default, Oct 4 2011, 20:03:08) [GCC 4.6.1] on linux2
I have chars with accents like é à. My scripts uses utf-8 encoding
#!/usr/bin/python
# -*- coding: utf-8 -*-
Users can type strings usings raw_input() with .
def rlinput(prompt, prefill=''):
readline.set_startup_hook(lambda: readline.insert_text( prefill))
try:
return raw_input(prompt)
finally:
readline.set_startup_hook()
called in the main loop 'pseudo' shell
while to_continue :
to_continue, feedback = action( unicode(rlinput(u'todo > '),'utf-8') )
os.system('clear')
print T, u"\n" + feedback
Data are stored as pickle in files.
I managed to have the app working but finaly get stupid things like
core file :
class Task()
...
def __str__(self):
r = (u"OK" if self._done else u"A faire").ljust(8) + self.getDesc()
return r.encode('utf-8')
and so in shell file :
feedback = jaune + str(t).decode('utf-8') + vert + u" supprimée"
That's where i realize that i might be totaly wrong with encoding/decoding. So I tried to decode directly in rlinput but failed. I read some post in stackoverflow, re-read http://docs.python.org/library/codecs.html Waiting for my python book, i'm lost :/
I guess there is a lot of bad code but my question here is only related to encoding issus. You can find the code here : (most comments in french, sorry that's for personnal use and i'm a beginner, you'll also need yapsy - http://yapsy.sourceforge.net/ ) (then configure paths, then in py_todo : ./todo_shell.py) : http://bit.ly/rzp9Jm
Standard input and output are byte-based on all Unix systems. That's why you have to call the unicode
function to get character-strings for them. The decode error indicates that the bytes coming in are not valid UTF-8.
Basically, the problem is the assumption of UTF-8 encoding, which is not guaranteed. Confirm this by changing the encoding in your unicode
call to 'ISO-8859-1'
, or by changing the character encoding of your terminal emulator to UTF-8. (Putty supports this, in the "Translation" menu.)
If the above experiment confirms this, your challenge is to support the locale of the user and deduce the correct encoding, or perhaps to make the user declare the encoding in a command line argument or configuration. The $LANG
environment variable is about the best you can do without an explicit declaration, and I find it to be a poor indicator of the desired character encoding.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With