Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encoding on PostgreSQL, Python, Jinja2

I'm having a problem with encoding in my application and didn't find the solution anywhere on web.

Here is the scenario:

  • PostgreSQL with UTF-8 encoding (CREATE DATABASE xxxx WITH ENCODING 'UTF8')

  • Python logic also with UTF-8 encoding (# -*- coding: utf-8 -*-)

  • Jinja2 to show my HTML pages. Python and Jinja2 are used on Flask, which is the microframework I'm using.

The header of my pages have: <meta http-equiv="content-type" content="text/html; charset=utf-8"/>

Well, using psycopg2 to do a simple query and print it on Jinja2, this is what I get:

{% for company in list %}
    <li>
        {{ company }}
    </li>
{% endfor %}

(1, 'Casa das M\xc3\xa1quinas', 'R. Tr\xc3\xaas, Mineiros - Goi\xc3\xa1s')

(2, 'Ar do Z\xc3\xa9', 'Av. S\xc3\xa9tima, Mineiros - Goi\xc3\xa1s')

If I try do get more deep into the fields:

{% for company in list %}
    <li>
        {% for field in company %}
            <li>
                {{ field }}
            </li>
        {% endfor %}
    </li>
 {% endfor %}

I get the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)

However, if I do a print of the list fields before sending them to Jinja2, I get the expected result (which is also how is presented in postgresql):

1 Casa das Máquinas R. Três, Mineiros - Goiás

2 Ar do Zé Av. Sétima, Mineiros - Goiás

When I get the error, Flask offers an option to "debug". This is where the code breaks File "/home/anonimou/Desktop/flask/lib/python2.7/site-packages/jinja2/_markupsafe/_native.py", line 21, in escape return Markup(unicode(s)

And I can also do:

[console ready]

>>> print s
Casa das Máquinas

>>> s
'Casa das M\xc3\xa1quinas'

>>> unicode(s)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)

>>> s.decode('utf-8')
u'Casa das M\xe1quinas'

>>> s.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)

>>> s.decode('utf-8').encode('utf-8')
'Casa das M\xc3\xa1quinas'

>>> print s.decode('utf-8').encode('utf-8')
Casa das Máquinas

>>> print s.decode('utf-8')
Casa das Máquinas

I've already tried to break the list, decode, encode, in python code before sending it to Jinja2. The same error.

Sooo, not sure what I can do here. =(

Thanks in advance!

like image 550
anonimou Avatar asked Feb 17 '23 12:02

anonimou


1 Answers

The issue is that psycopg2 returns byte strings by default in Python 2:

When reading data from the database, in Python 2 the strings returned are usually 8 bit str objects encoded in the database client encoding

So you can either:

  • Manually decode all of the data to UTF-8:

    # Decode the byte strings into Unicode objects using
    # the encoding you know that your database is using.
    companies = [company.decode("utf-8") for company in companies]
    return render_template("companies.html", companies=companies)
    

or

  • Set the encoders when you first import psycopg2 as per the note in the same section of the manual:

    Note In Python 2, if you want to uniformly receive all your database input in Unicode, you can register the related typecasters globally as soon as Psycopg is imported:

    import psycopg2
    import psycopg2.extensions
    psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
    psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)
    

    and then forget about this story.

like image 158
Sean Vieira Avatar answered Feb 20 '23 12:02

Sean Vieira