Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python3: UnicodeEncodeError: 'ascii' codec can't encode character '\xfc'

I'am trying to get running a very simple example on OSX with python 3.5.1 but I'm really stucked. Have read so many articles that deal with similar problems but I can not fix this by myself. Do you have any hints how to resolve this issue?

I would like to have the correct encoded latin-1 output as defined in mylist without any errors.

My code:

# coding=<latin-1>

mylist = [u'Glück', u'Spaß', u'Ähre',]
print(mylist)

The error:

Traceback (most recent call last):
File "/Users/abc/test.py", line 4, in <module>
print(mylist)
UnicodeEncodeError: 'ascii' codec can't encode character '\xfc' in position 4: ordinal not in range(128)

How I can fix the error but still get something wrong with stdout (print):

mylist = [u'Glück', u'Spaß', u'Ähre',]
    for w in mylist:
        print(w.encode("latin-1"))

What I get as output:

b'Gl\xfcck'
b'Spa\xdf'
b'\xc4hre'

What 'locale' shows me:

LANG="de_AT.UTF-8"
LC_COLLATE="de_AT.UTF-8"
LC_CTYPE="de_AT.UTF-8"
LC_MESSAGES="de_AT.UTF-8"
LC_MONETARY="de_AT.UTF-8"
LC_NUMERIC="de_AT.UTF-8"
LC_TIME="de_AT.UTF-8"
LC_ALL=

What -> 'python3' shows me:

Python 3.5.1 (default, Jan 22 2016, 08:54:32) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'
like image 573
Hans Bondoka Avatar asked Aug 10 '16 20:08

Hans Bondoka


2 Answers

Try running your script with explicitly defined PYTHONIOENCODING environment variable:

PYTHONIOENCODING=utf-8 python3 script.py
like image 51
markhor Avatar answered Oct 14 '22 14:10

markhor


Remove the characters < and >:

# coding=latin-1

Those character are often used in examples to indicate where the encoding name goes, but the literal characters < and > should not be included in your file.

For that to work, your file must be encoded using latin-1. If your file is actually encoded using utf-8, the encoding line should be

# coding=utf-8

For example, when I run this script (saved as a file with latin-1 encoding):

# coding=latin-1

mylist = [u'Glück', u'Spaß', u'Ähre',]
print(mylist)

for w in mylist:
    print(w.encode("latin-1"))

I get this output (with no errors):

['Glück', 'Spaß', 'Ähre']
b'Gl\xfcck'
b'Spa\xdf'
b'\xc4hre'

That output looks correct. For example, the latin-1 encoding of ü is '\xfc'.

I used my editor to save the file with latin-1 encoding. The contents of the file in hexadecimal are:

$ hexdump -C  codec-question.py 
00000000  23 20 63 6f 64 69 6e 67  3d 6c 61 74 69 6e 2d 31  |# coding=latin-1|
00000010  0a 0a 6d 79 6c 69 73 74  20 3d 20 5b 75 27 47 6c  |..mylist = [u'Gl|
00000020  fc 63 6b 27 2c 20 75 27  53 70 61 df 27 2c 20 75  |.ck', u'Spa.', u|
00000030  27 c4 68 72 65 27 2c 5d  0a 70 72 69 6e 74 28 6d  |'.hre',].print(m|
00000040  79 6c 69 73 74 29 0a 0a  66 6f 72 20 77 20 69 6e  |ylist)..for w in|
00000050  20 6d 79 6c 69 73 74 3a  0a 20 20 20 20 70 72 69  | mylist:.    pri|
00000060  6e 74 28 77 2e 65 6e 63  6f 64 65 28 22 6c 61 74  |nt(w.encode("lat|
00000070  69 6e 2d 31 22 29 29 0a                           |in-1")).|
00000078

Note that the first byte (represented in hexadecimal) in the third line (i.e. the character at position 0x20) is fc. That is the latin-1 encoding of ü. If the file was encoded using utf-8, the character ü would be represented using two bytes, c3 bc.

like image 45
Warren Weckesser Avatar answered Oct 14 '22 13:10

Warren Weckesser