Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert UTF8 string into HTML string in python 2.5 for correct accent displaying?

Tags:

python

html

utf-8

My string UFT8, coming from a database (CSV file encoded in UTF8) is displayed like this on a browser with my main.py code: value ="roulement \u00e0 billes"

=> how to convert any of such string into HTML entities, such as value="roulement &agrave billes" in order to display correctly as roulement à billes with a browser.

I tried to add:

 # -*- coding: utf-8 -*-

on the 1st line of my file , and also :

 self.response.headers['Content-Type'] = 'text/html;charset=UTF-8'

but it doesn't change anything

=> so, may be another way is to translate it into html entities ? how to ?

Thank you.

like image 403
user1459604 Avatar asked Mar 28 '26 03:03

user1459604


2 Answers

First you should make sure value is of type unicode and not a string

value.encode('ascii', 'xmlcharrefreplace')

Should get you the HTML enitites

Python Unicode Documentation

>>> value = u"roulement \u00e0 billes"
>>> print value
roulement à billes
>>> print value.encode('ascii', 'xmlcharrefreplace')
roulement à billes
>>>
like image 196
StefanE Avatar answered Mar 29 '26 15:03

StefanE


To embed unicode string literals in your code:

a) Make sure your source file is in UTF-8 (and add the # -*- coding line), then use the literals directly:

u'Zażółć gęślą jaźń'

b) Escape them in unicode literals:

u"roulement \u00e0 billes"

In both cases you need to use the unicode type, not str type, so prefix your literals with u.

>>> type("kos")
<type 'str'>
>>> type(u"kos")
<type 'unicode'>

how to convert any of such string into HTML entities, such as value="roulement &agrave billes" in order to display correctly as roulement à billes with a browser.

You shouldn't need to do this, except those that interfer with HTML itself, like < or > and a couple more.

Just encode your HTML file as UTF-8 and make sure that the browser will pick the encoding up (the response content type is cool, you can also drop in <meta charset="UTF-8"> or <meta http-equiv="content-type" content="text/html; charset=UTF-8"> inside <head>. The regional characters should be understood by browsers easily.

like image 25
Kos Avatar answered Mar 29 '26 15:03

Kos