I'm having a problem with encoding in my application and didn't find the solution anywhere on web.
Here is the scenario:
PostgreSQL with UTF-8 encoding (CREATE DATABASE xxxx WITH ENCODING 'UTF8'
)
Python logic also with UTF-8 encoding (# -*- coding: utf-8 -*-
)
Jinja2 to show my HTML pages. Python and Jinja2 are used on Flask, which is the microframework I'm using.
The header of my pages have: <meta http-equiv="content-type" content="text/html; charset=utf-8"/>
Well, using psycopg2 to do a simple query and print it on Jinja2, this is what I get:
{% for company in list %}
<li>
{{ company }}
</li>
{% endfor %}
(1, 'Casa das M\xc3\xa1quinas', 'R. Tr\xc3\xaas, Mineiros - Goi\xc3\xa1s')
(2, 'Ar do Z\xc3\xa9', 'Av. S\xc3\xa9tima, Mineiros - Goi\xc3\xa1s')
If I try do get more deep into the fields:
{% for company in list %}
<li>
{% for field in company %}
<li>
{{ field }}
</li>
{% endfor %}
</li>
{% endfor %}
I get the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)
However, if I do a print of the list fields before sending them to Jinja2, I get the expected result (which is also how is presented in postgresql):
1 Casa das Máquinas R. Três, Mineiros - Goiás
2 Ar do Zé Av. Sétima, Mineiros - Goiás
When I get the error, Flask offers an option to "debug". This is where the code breaks File "/home/anonimou/Desktop/flask/lib/python2.7/site-packages/jinja2/_markupsafe/_native.py", line 21, in escape return Markup(unicode(s)
And I can also do:
[console ready]
>>> print s
Casa das Máquinas
>>> s
'Casa das M\xc3\xa1quinas'
>>> unicode(s)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)
>>> s.decode('utf-8')
u'Casa das M\xe1quinas'
>>> s.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)
>>> s.decode('utf-8').encode('utf-8')
'Casa das M\xc3\xa1quinas'
>>> print s.decode('utf-8').encode('utf-8')
Casa das Máquinas
>>> print s.decode('utf-8')
Casa das Máquinas
I've already tried to break the list, decode, encode, in python code before sending it to Jinja2. The same error.
Sooo, not sure what I can do here. =(
Thanks in advance!
The issue is that psycopg2 returns byte strings by default in Python 2:
When reading data from the database, in Python 2 the strings returned are usually 8 bit
str
objects encoded in the database client encoding
So you can either:
Manually decode all of the data to UTF-8:
# Decode the byte strings into Unicode objects using
# the encoding you know that your database is using.
companies = [company.decode("utf-8") for company in companies]
return render_template("companies.html", companies=companies)
or
Set the encoders when you first import psycopg2 as per the note in the same section of the manual:
Note In Python 2, if you want to uniformly receive all your database input in Unicode, you can register the related typecasters globally as soon as Psycopg is imported:
import psycopg2
import psycopg2.extensions
psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)
and then forget about this story.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With