Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace html entities with the corresponding utf-8 characters in Python 2.6

I have a html text like this:

<xml ... >

and I want to convert it to something readable:

<xml ...>

Any easy (and fast) way to do it in Python?

like image 230
Alexandru Avatar asked Apr 08 '09 14:04

Alexandru


People also ask

How do you escape HTML tags in Python?

With the help of html. escape() method, we can convert the html script into a string by replacing special characters with the string with ascii characters by using html. escape() method. Return : Return a string of ascii character script from html.

What is HTML Unescape?

The html. unescape() method helps us to convert the ascii string into html script by replacing ascii characters with special HTML characters. This tool will convert HTML entities to a string or convert plain text to HTML entities.


2 Answers

Python >= 3.4

Official documentation for HTMLParser: Python 3

>>> from html import unescape
>>> unescape('&copy; &euro;')
© €

Python < 3.5

Official documentation for HTMLParser: Python 3

>>> from html.parser import HTMLParser
>>> pars = HTMLParser()
>>> pars.unescape('&copy; &euro;')
© €

Note: this was deprecated in the favor of html.unescape().

Python 2.7

Official documentation for HTMLParser: Python 2.7

>>> import HTMLParser
>>> pars = HTMLParser.HTMLParser()
>>> pars.unescape('&copy; &euro;')
u'\xa9 \u20ac'
>>> print _
© €
like image 146
vartec Avatar answered Sep 20 '22 17:09

vartec


Modern Python 3 approach:

>>> import html
>>> html.unescape('&copy; &euro;')
© €

https://docs.python.org/3/library/html.html

like image 29
fmalina Avatar answered Sep 20 '22 17:09

fmalina