Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-8 error with Python and gettext

I use UTF-8 in my editor, so all strings displayed here are UTF-8 in file.

I have a python script like this:

# -*- coding: utf-8 -*-
...
parser = optparse.OptionParser(
  description=_('automates the dice rolling in the classic game "risk"'), 
  usage=_("usage: %prog attacking defending"))

Then I used xgettext to get everything out and got a .pot file which can be boiled down to:

"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"

#: auto_dice.py:16
msgid "automates the dice rolling in the classic game \"risk\""
msgstr ""

After that, I used msginit to get a de.po which I filled in like this:

"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#: auto_dice.py:16
msgid "automates the dice rolling in the classic game \"risk\""
msgstr "automatisiert das Würfeln bei \"Risiko\""

Running the script, I get the following error:

  File "/usr/lib/python2.6/optparse.py", line 1664, in print_help
    file.write(self.format_help().encode(encoding, "replace"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 60: ordinal not in range(128)

How can I fix that?

like image 457
Martin Ueding Avatar asked Apr 04 '11 22:04

Martin Ueding


People also ask

What does encoding =' UTF-8 do in Python?

UTF-8 is a byte oriented encoding. The encoding specifies that each character is represented by a specific sequence of one or more bytes.

What is decode (' UTF-8 ') in Python?

Python Python Decoding Python UTF-8. Created: January-06, 2022. Encoding refers to encoding a string using an encoding scheme such as UTF-8 . Decoding refers to converting an encoded string from one encoding to another encoding scheme.


1 Answers

That error means you've called encode on a bytestring, so it tries to decode it to Unicode using the system default encoding (ascii on Python 2), then re-encode it with whatever you've specified.

Generally, the way to resolve it is to call s.decode('utf-8') (or whatever encoding the strings are in) before trying to use the strings. It might also work if you just use unicode literals: u'automates...' (that depends on how strings are substituted from .po files, which I don't know about).

This sort of confusing behaviour is improved in Python 3, which won't try to convert bytes to unicode unless you specifically tell it to.

like image 170
Thomas K Avatar answered Oct 01 '22 21:10

Thomas K