I'm using Python 2.6.5 and when I run the following in the Python shell, I get:
>>> print u'Andr\xc3\xa9'
André
>>> print 'Andr\xc3\xa9'
André
>>>
What's the explanation for the above? Given u'Andr\xc3\xa9', how can I display the above value properly in an html page so that it shows André instead of André?
'\xc3\xa9' is the UTF-8 encoding of the unicode character u'\u00e9' (which can also be specified as u'\xe9'). So you can use u'Andr\u00e9' or u'Andr\xe9'.
You can convert from one to the other:
>>> 'Andr\xc3\xa9'.decode('utf-8')
u'Andr\xe9'
>>> u'Andr\xe9'.encode('utf-8')
'Andr\xc3\xa9'
Note that the reason print 'Andr\xc3\xa9' gave you the expected result is only because your system's default encoding is UTF-8. For example, on Windows I get:
>>> print 'Andr\xc3\xa9'
André
As for outputting HTML, it depends on which web framework you use and what encoding you output in the HTML page. Some frameworks (e.g. Django) will convert unicode values to the correct encoding automatically, while others will require you to do so manually.
Try this:
>>> unicode('Andr\xc3\xa9', 'utf-8')
u'Andr\xe9'
>>> print u'Andr\xe9'
André
That may answer your question.
EDIT: or see the above answer
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With