How to prevent lxml modify tags
from lxml import etree
from lxml.html.soupparser import fromstring
html = '<iframe width="560" height="315" src="" frameborder="0" allowfullscreen></iframe>'
root = fromstring(html)
print etree.tostring(root,encoding='utf-8')
it prints short version of tag
'<iframe width="560" height="315" src="" frameborder="0" allowfullscreen/>'
how to prevent this? needed output
'<iframe width="560" height="315" src="" frameborder="0" allowfullscreen></iframe>'
?
Use tostring() with method="html":
print etree.tostring(root.find('iframe'), encoding='utf-8', method="html")
Demo:
>>> from lxml import etree
>>> from lxml.html.soupparser import fromstring
>>>
>>> html = '<iframe width="560" height="315" src="" frameborder="0" allowfullscreen></iframe>'
>>> root = fromstring(html)
>>> print etree.tostring(root.find('iframe'), encoding='utf-8', method="html")
<iframe allowfullscreen="allowfullscreen" frameborder="0" height="315" src="" width="560"></iframe>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With