Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can i successfully use UNICODE characters in my .py files without causing trouble?

Tags:

python

unicode

I am writing a test for a database which has Swedish characters in it. In the test, i directly use characters with umlauts and other such Swedish ligatures and it runs just fine, reading filenames in from a database and doing string compares successfully.

However, upon importing this file to do pydoc generation, i get the all-too-familiar exception:

SyntaxError: Non-ASCII character '\xc3' in file foo.py on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

Upon doing some investigation on my own, i found that adding

# -*- coding: iso-8859-15 -*-

to the top of my file fixed the importing problem. However, now the test fails all of the string comparisons. I tried the alternate method of forgoing the coding declaration and writing the strings as

u"Bokmärken"

... but this still doesn't keep the test from failing.

Does anyone know a good way to fix this?

like image 885
Staunch Avatar asked Dec 27 '22 17:12

Staunch


1 Answers

You need to set your encoding in your editor and the database so that they match. If your database is utf-8 encoded, and not iso-8859-15, then setting your editor to utf-8 should fix it. However, since your u'string' comparisons fail, this might not be the case.

Replace

# -*- coding: iso-8859-15 -*-

with

# -*- coding: utf-8 -*-

or (the equivalent)

# coding=utf-8

To try utf-8 encoding.

Printing debugging output with repr('swedish string' and repr(u'swedish string') will also be useful in inspecting differences. Right after your interpreter line. Can you tell us what encoding your database is set to? Additionally, was the database data written by python or inserted directly? You could have written data in the wrong encoding to the database to begin with, which is now causing problems on comparison.

like image 194
shelhamer Avatar answered Dec 30 '22 11:12

shelhamer