I'm trying to make a div element from the below string with html entities. Since my string contains html entities, &
reserved char in the html entity is being escaped as &
in the output. Thus html entities are displayed as plain text. How can I avoid this so html entities are rendered properly?
s = 'Actress Adamari López And Amgen Launch Spanish-Language Chemotherapy: Myths Or Facts™ Website And Resources'
div = etree.Element("div")
div.text = s
lxml.html.tostring(div)
output:
<div>Actress Adamari L&#243;pez And Amgen Launch Spanish-Language Chemotherapy: Myths Or Facts&#8482; Website And Resources</div>
You can specify encoding
while calling tostring()
:
>>> from lxml.html import fromstring, tostring
>>> s = 'Actress Adamari López And Amgen Launch Spanish-Language Chemotherapy: Myths Or Facts™ Website And Resources'
>>> div = fromstring(s)
>>> print tostring(div, encoding='unicode')
<p>Actress Adamari López And Amgen Launch Spanish-Language Chemotherapy: Myths Or Facts™ Website And Resources</p>
As a side note, you should definitely use lxml.html.tostring()
while dealing with HTML
data:
Note that you should use
lxml.html.tostring
and notlxml.tostring
.lxml.tostring(doc)
will return the XML representation of the document, which is not valid HTML. In particular, things like<script src="..."></script>
will be serialized as<script src="..." />
, which completely confuses browsers.
Also see:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With