Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

lxml modify tags prevent

Tags:

python

html

lxml

How to prevent lxml modify tags

from lxml import etree
from lxml.html.soupparser import fromstring

html = '<iframe width="560" height="315" src="" frameborder="0" allowfullscreen></iframe>'
root = fromstring(html)
print etree.tostring(root,encoding='utf-8')

it prints short version of tag

'<iframe width="560" height="315" src="" frameborder="0" allowfullscreen/>'

how to prevent this? needed output

'<iframe width="560" height="315" src="" frameborder="0" allowfullscreen></iframe>'

?

like image 314
Evg Avatar asked Jun 06 '26 01:06

Evg


1 Answers

Use tostring() with method="html":

print etree.tostring(root.find('iframe'), encoding='utf-8', method="html")

Demo:

>>> from lxml import etree
>>> from lxml.html.soupparser import fromstring
>>>
>>> html = '<iframe width="560" height="315" src="" frameborder="0" allowfullscreen></iframe>'
>>> root = fromstring(html)
>>> print etree.tostring(root.find('iframe'), encoding='utf-8', method="html")
<iframe allowfullscreen="allowfullscreen" frameborder="0" height="315" src="" width="560"></iframe>
like image 112
alecxe Avatar answered Jun 08 '26 15:06

alecxe