I used BeautifulSoup to handle XML files that I have collected through a REST API. The responses contain HTML code, but BeautifulSoup escapes all the HTML tags so it can be displayed nicely. Unfortunately I need the HTML code. <hr> How would I go on about transforming the escaped HTML into proper markup? <hr> Help would be very much appreciated!

I think you want xml.sax.saxutils.unescape from the Python standard library. E.g.: <pre class="prettyprint"><code>>>> from xml.sax import saxutils as su >>> s = '&lt;foo&gt;bar&lt;/foo&gt;' >>> su.unescape(s) '<foo>bar</foo>' </code></pre>

From escaped html -> to regular html? - Python

2 Answers

I think you want xml.sax.saxutils.unescape from the Python standard library.

E.g.:

>>> from xml.sax import saxutils as su
>>> s = '&lt;foo&gt;bar&lt;/foo&gt;'
>>> su.unescape(s)
'<foo>bar</foo>'

181

answered Oct 19 '22 05:10

Alex Martelli

You could try the urllib module?

It has a method unquote() that might suit your needs.

Edit: on second thought, (and more reading of your question) you might just want to just use string.replace()

Like so:

string.replace('&lt;','<')
string.replace('&gt;','>')

answered Oct 19 '22 03:10

Nathan Osman

Related questions
                            
                                Is there a library similar to pyparsing in Java? [closed]
                            
                                How do I handle Microsoft outlook winmail.dat? Any other surprises?
                            
                                How do you manage your Django applications?
                            
                                Python, SimPy: How to generate a value from a triangular probability distribution?
                            
                                Custom simple Python HTTP server not serving css files
                            
                                xml.parsers.expat.ExpatError on parsing XML
                            
                                Django - last insert id
                            
                                Ensure that only one instance of a class gets run
                            
                                Python XMLRPC with concurrent requests
                            
                                I need __closure__
                            
                                Is it still Python 2.6 versus Python 3?
                            
                                How do I parse the people's first and last name in Python?
                            
                                Why do I see "cannot import name descriptor_pb2" error when using Google Protocol Buffers?
                            
                                Django TemplateSyntaxError: current transaction is aborted, what does this exception mean? Does postgresql 8.4 work fine with django?
                            
                                DJANGO : Update div with AJAX
                            
                                How can I change '>' to '&gt;' and '&gt;' to '>'? [duplicate]
                            
                                scipy linregress function erroneous standard error return?
                            
                                How to conduct buffer overflow in PHP/Python?
                            
                                How can I process xml asynchronously in python?
                            
                                Using namedtuple._replace with a variable as a fieldname

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

From escaped html -> to regular html? - Python

Tags:

python

html

escaping

beautifulsoup

lxml

RadiantHex

People also ask

2 Answers

Alex Martelli

Nathan Osman

Recent Activity

Donate For Us