My string UFT8, coming from a database (CSV file encoded in UTF8) is displayed like this on a browser with my main.py code: value ="roulement \u00e0 billes"
=> how to convert any of such string into HTML entities, such as value="roulement à billes" in order to display correctly as roulement à billes with a browser.
I tried to add:
# -*- coding: utf-8 -*-
on the 1st line of my file , and also :
self.response.headers['Content-Type'] = 'text/html;charset=UTF-8'
but it doesn't change anything
=> so, may be another way is to translate it into html entities ? how to ?
Thank you.
First you should make sure value is of type unicode and not a string
value.encode('ascii', 'xmlcharrefreplace')
Should get you the HTML enitites
Python Unicode Documentation
>>> value = u"roulement \u00e0 billes"
>>> print value
roulement à billes
>>> print value.encode('ascii', 'xmlcharrefreplace')
roulement à billes
>>>
To embed unicode string literals in your code:
a) Make sure your source file is in UTF-8 (and add the # -*- coding line), then use the literals directly:
u'Zażółć gęślą jaźń'
b) Escape them in unicode literals:
u"roulement \u00e0 billes"
In both cases you need to use the unicode type, not str type, so prefix your literals with u.
>>> type("kos")
<type 'str'>
>>> type(u"kos")
<type 'unicode'>
how to convert any of such string into HTML entities, such as value="roulement à billes" in order to display correctly as roulement à billes with a browser.
You shouldn't need to do this, except those that interfer with HTML itself, like < or > and a couple more.
Just encode your HTML file as UTF-8 and make sure that the browser will pick the encoding up (the response content type is cool, you can also drop in <meta charset="UTF-8"> or <meta http-equiv="content-type" content="text/html; charset=UTF-8"> inside <head>. The regional characters should be understood by browsers easily.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With