Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't Python recognize my utf-8 encoded source file?

Tags:

Here is a little tmp.py with a non ASCII character:

if __name__ == "__main__":
    s = 'ß'
    print(s)

Running it I get the following error:

Traceback (most recent call last):
  File ".\tmp.py", line 3, in <module>
    print(s)
  File "C:\Python32\lib\encodings\cp866.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xdf' in position 0: character maps to <undefined>

The Python docs says:

By default, Python source files are treated as encoded in UTF-8...

My way of checking the encoding is to use Firefox (maybe someone would suggest something more obvious). I open tmp.py in Firefox and if I select View->Character Encoding->Unicode (UTF-8) it looks ok, that is the way it looks above in this question (wth ß symbol).

If I put:

# -*- encoding: utf-8 -*-

as the first string in tmp.py it does not change anything—the error persists.

Could someone help me to figure out what am I doing wrong?

like image 532
Anton Daneyko Avatar asked Jan 11 '13 18:01

Anton Daneyko


People also ask

How do I open a UTF-8 file in Python?

Use open() to open a file with UTF-8 encoding Call open(file, encoding=None) with encoding as "UTF-8" to open file with UTF-8 encoding.

How do I fix encoding in Python?

The best way to attack the problem, as with many things in Python, is to be explicit. That means that every string that your code handles needs to be clearly treated as either Unicode or a byte sequence. The most systematic way to accomplish this is to make your code into a Unicode-only clean room.

How do I ensure my UTF-8 encoding?

Click Tools, then select Web options. Go to the Encoding tab. In the dropdown for Save this document as: choose Unicode (UTF-8). Click Ok.

How do you create a text file with UTF-8 encoding in Python?

Use file. write() to write UTF-8 text to a file In a with-as statement, call open(file, mode, encoding="utf-8") with mode as "w" to open file for writing in UTF-8 encoding. Call file. write(data) to write the text contained in data to the opened file . with open("sample.


1 Answers

The encoding your terminal is using doesn't support that character:

>>> '\xdf'.encode('cp866')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/encodings/cp866.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\xdf' in position 0: character maps to <undefined>

Python is handling it just fine, it's your output encoding that cannot handle it.

You can try using chcp 65001 in the Windows console to switch your codepage; chcp is a windows command line command to change code pages.

Mine, on OS X (using UTF-8) can handle it just fine:

>>> print('\xdf')
ß
like image 105
Martijn Pieters Avatar answered Oct 05 '22 16:10

Martijn Pieters