I used BeautifulSoup to handle XML files that I have collected through a REST API.
The responses contain HTML code, but BeautifulSoup escapes all the HTML tags so it can be displayed nicely.
Unfortunately I need the HTML code.
How would I go on about transforming the escaped HTML into proper markup?
Help would be very much appreciated!
To insert characters that are illegal in a string, use an escape character. An escape character is a backslash \ followed by the character you want to insert.
With the help of html. escape() method, we can convert the html script into a string by replacing special characters with the string with ascii characters by using html. escape() method. Syntax : html.escape(String) Return : Return a string of ascii character script from html.
html. unescape() replaces the entity names or entity numbers of the reserved HTML characters with its original character representation. For example, the string <div\> will be decoded to <div> .
I think you want xml.sax.saxutils.unescape from the Python standard library.
E.g.:
>>> from xml.sax import saxutils as su
>>> s = '<foo>bar</foo>'
>>> su.unescape(s)
'<foo>bar</foo>'
You could try the urllib module?
It has a method unquote()
that might suit your needs.
Edit: on second thought, (and more reading of your question) you might just want to just use string.replace()
Like so:
string.replace('<','<')
string.replace('>','>')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With