Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using utf-8 characters in a Jinja2 template

I'm trying to use utf-8 characters when rendering a template with Jinja2. Here is how my template looks like:

<!DOCTYPE HTML> <html manifest="" lang="en-US"> <head>     <meta charset="UTF-8">     <title>{{title}}</title> ... 

The title variable is set something like this:

index_variables = {'title':''} index_variables['title'] = myvar.encode("utf8")  template = env.get_template('index.html') index_file = open(preview_root + "/" + "index.html", "w")  index_file.write(     template.render(index_variables) ) index_file.close() 

Now, the problem is that myvar is a message read from a message queue and can contain those special utf8 characters (ex. "Séptimo Cine").

The rendered template looks something like:

...     <title>S\u00e9ptimo Cine</title> ... 

and I want it to be:

...     <title>Séptimo Cine</title> ... 

I have made several tests but I can't get this to work.

  • I have tried to set the title variable without .encode("utf8"), but it throws an exception (ValueError: Expected a bytes object, not a unicode object), so my guess is that the initial message is unicode

  • I have used chardet.detect to get the encoding of the message (it's "ascii"), then did the following: myvar.decode("ascii").encode("cp852"), but the title is still not rendered correctly.

  • I also made sure that my template is a UTF-8 file, but it didn't make a difference.

Any ideas on how to do this?

like image 805
alex.ac Avatar asked Mar 04 '14 20:03

alex.ac


People also ask

Does Jinja template engine support Unicode?

Jinja is using Unicode internally which means that you have to pass Unicode objects to the render function or bytestrings that only consist of ASCII characters.

Can UTF-8 represent all characters?

Each UTF can represent any Unicode character that you need to represent. UTF-8 is based on 8-bit code units. Each character is encoded as 1 to 4 bytes. The first 128 Unicode code points are encoded as 1 byte in UTF-8.

Does UTF-8 cover all Unicode?

UTF-8 is a character encoding - a way of converting from sequences of bytes to sequences of characters and vice versa. It covers the whole of the Unicode character set.

Is UTF-8 ASCII or Unicode?

UTF-8 encodes Unicode characters into a sequence of 8-bit bytes. The standard has a capacity for over a million distinct codepoints and is a superset of all characters in widespread use today. By comparison, ASCII (American Standard Code for Information Interchange) includes 128 character codes.


2 Answers

TL;DR:

  • Pass Unicode to template.render()
  • Encode the rendered unicode result to a bytestring before writing it to a file

This had me puzzled for a while. Because you do

index_file.write(     template.render(index_variables) ) 

in one statement, that's basically just one line where Python is concerned, so the traceback you get is misleading: The exception I got when recreating your test case didn't happen in template.render(index_variables), but in index_file.write() instead. So splitting the code up like this

output = template.render(index_variables) index_file.write(output) 

was the first step to diagnose where exactly the UnicodeEncodeError happens.

Jinja returns unicode whet you let it render the template. Therefore you need to encode the result to a bytestring before you can write it to a file:

index_file.write(output.encode('utf-8')) 

The second error is that you pass in an utf-8 encoded bytestring to template.render() - Jinja wants unicode. So assuming your myvar contains UTF-8, you need to decode it to unicode first:

index_variables['title'] = myvar.decode('utf-8') 

So, to put it all together, this works for me:

# -*- coding: utf-8 -*-  from jinja2 import Environment, PackageLoader env = Environment(loader=PackageLoader('myproject', 'templates'))   # Make sure we start with an utf-8 encoded bytestring myvar = 'Séptimo Cine'  index_variables = {'title':''}  # Decode the UTF-8 string to get unicode index_variables['title'] = myvar.decode('utf-8')  template = env.get_template('index.html')  with open("index_file.html", "wb") as index_file:     output = template.render(index_variables)      # jinja returns unicode - so `output` needs to be encoded to a bytestring     # before writing it to a file     index_file.write(output.encode('utf-8')) 
like image 143
Lukas Graf Avatar answered Sep 18 '22 14:09

Lukas Graf


Try changing your render command to this...

template.render(index_variables).encode( "utf-8" ) 

Jinja2's documentation says "This will return the rendered template as unicode string."

http://jinja.pocoo.org/docs/api/?highlight=render#jinja2.Template.render

Hope this helps!

like image 40
Andrew Kloos Avatar answered Sep 20 '22 14:09

Andrew Kloos