How do you convert HTML entities to Unicode and vice versa in Python?
The htmlentities() function converts characters to HTML entities. Tip: To convert HTML entities back to characters, use the html_entity_decode() function. Tip: Use the get_html_translation_table() function to return the translation table used by htmlentities().
HTML encoding converts characters that are not allowed in HTML into character-entity equivalents; HTML decoding reverses the encoding. For example, when embedded in a block of text, the characters < and > are encoded as < and > for HTTP transmission.
> and < is a character entity reference for the > and < character in HTML. It is not possible to use the less than (<) or greater than (>) signs in your file, because the browser will mix them with tags. for these difficulties you can use entity names( > ) and entity numbers( < ).
As to the "vice versa" (which I needed myself, leading me to find this question, which didn't help, and subsequently another site which had the answer):
u'some string'.encode('ascii', 'xmlcharrefreplace')
will return a plain string with any non-ascii characters turned into XML (HTML) entities.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With