Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert HTML entities to Unicode and vice versa

How do you convert HTML entities to Unicode and vice versa in Python?

like image 382
hekevintran Avatar asked Mar 31 '09 15:03

hekevintran


People also ask

How do you encode an HTML entity?

The htmlentities() function converts characters to HTML entities. Tip: To convert HTML entities back to characters, use the html_entity_decode() function. Tip: Use the get_html_translation_table() function to return the translation table used by htmlentities().

What is HTML encoding and decoding?

HTML encoding converts characters that are not allowed in HTML into character-entity equivalents; HTML decoding reverses the encoding. For example, when embedded in a block of text, the characters < and > are encoded as &lt; and &gt; for HTTP transmission.

What is HTML &GT?

&gt; and &lt; is a character entity reference for the > and < character in HTML. It is not possible to use the less than (<) or greater than (>) signs in your file, because the browser will mix them with tags. for these difficulties you can use entity names( &gt; ) and entity numbers( &#60; ).


1 Answers

As to the "vice versa" (which I needed myself, leading me to find this question, which didn't help, and subsequently another site which had the answer):

u'some string'.encode('ascii', 'xmlcharrefreplace') 

will return a plain string with any non-ascii characters turned into XML (HTML) entities.

like image 81
Isaac Avatar answered Oct 20 '22 21:10

Isaac