Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I display non-english characters in python?

I have a python dictionary which contains items that have non-english characters. When I print the dictionary, the python shell does not properly display the non-english characters. How can I fix this?

like image 252
alwbtc Avatar asked Dec 17 '22 06:12

alwbtc


1 Answers

When your application prints hei\xdfen instead of heißen, it means you are not actually printing the actual unicode string, but instead, on the string representation of the unicode object.

Let us assume your string ("heißen") is stored into variable called text. Just to make sure where you are at, check out the type of this variable by calling:

>>> type(text)

If you get <type 'unicode'>, it means you are not dealing with a string, but instead a unicode object.

If you do the intuive thing and try to print to text by invoking print(text) you won't get out the actual text ("heißen") but instead, a string representation of a unicode object.

To fix this, you need to know which encoding your terminal has and print out your unicode object encoded according to the given encoding.

For instance, if your terminal uses UTF-8 encoding, you can print out a string by invoking:

text.encode('utf-8')

That's for the basic concepts. Now let me give you a more detailed example. Let us assume we have a source code file storing your dictionary. Like:

mydict = {'heiße': 'heiße', 'äää': 'ööö'}

When you type print mydict you will get {'\xc3\xa4\xc3\xa4\xc3\xa4': '\xc3\xb6\xc3\xb6\xc3\xb6', 'hei\xc3\x9fe': 'hei\xc3\x9fe'}. Even print mydict['äää'] doesn't work: it results in something like ├Â├Â├Â. The nature of the problem is revealed by trying out print type(mydict['äää']) which will tell you that you are dealing with a string object.

In order to fix the problem, you first need to decode the string representation from your source code file's charset to unicode object and then represent it in the charset of your terminal. For individual dict items this can be achived by:

print unicode(mydict, 'utf-8')

Note that if default encoding doesn't apply to your terminal, you need to write:

print unicode(mydict, 'utf-8').encode('utf-8')

Where the outer encode method specifies the encoding according to your terminal.

I really really urge you to read through Joel's "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)". Unless you understand how character sets work, you will stumble across problems similar to this again and again.

like image 171
jsalonen Avatar answered Dec 19 '22 10:12

jsalonen