Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

From escaped html -> to regular html? - Python

I used BeautifulSoup to handle XML files that I have collected through a REST API.

The responses contain HTML code, but BeautifulSoup escapes all the HTML tags so it can be displayed nicely.

Unfortunately I need the HTML code.


How would I go on about transforming the escaped HTML into proper markup?


Help would be very much appreciated!

like image 438
RadiantHex Avatar asked Mar 19 '10 04:03

RadiantHex


People also ask

What does escape () do python?

To insert characters that are illegal in a string, use an escape character. An escape character is a backslash \ followed by the character you want to insert.

What does HTML escape ()?

With the help of html. escape() method, we can convert the html script into a string by replacing special characters with the string with ascii characters by using html. escape() method. Syntax : html.escape(String) Return : Return a string of ascii character script from html.

What does HTML Unescape do in Python?

html. unescape() replaces the entity names or entity numbers of the reserved HTML characters with its original character representation. For example, the string &lt;div\&gt; will be decoded to <div> .


2 Answers

I think you want xml.sax.saxutils.unescape from the Python standard library.

E.g.:

>>> from xml.sax import saxutils as su
>>> s = '&lt;foo&gt;bar&lt;/foo&gt;'
>>> su.unescape(s)
'<foo>bar</foo>'
like image 181
Alex Martelli Avatar answered Oct 19 '22 05:10

Alex Martelli


You could try the urllib module?

It has a method unquote() that might suit your needs.

Edit: on second thought, (and more reading of your question) you might just want to just use string.replace()

Like so:

string.replace('&lt;','<')
string.replace('&gt;','>')
like image 38
Nathan Osman Avatar answered Oct 19 '22 03:10

Nathan Osman