Why does this element in lxml include the tail?

Question

Consider this Python script:

from lxml import etree

html = '''
<html xmlns="http://www.w3.org/1999/xhtml">
<head></head>
  <body>
    <p>This is some text followed with 2 citations.<span class="footnote">1</span>
       <span сlass="footnote">2</span>This is some more text.</p>
  </body>
</html>'''

tree = etree.fromstring(html)

for element in tree.findall(".//{*}span"):
    if element.get("class") == 'footnote':
        print(etree.tostring(element, encoding="unicode", pretty_print=True))

The desired output would be the 2 span elements, instead I get:

<span xmlns="http://www.w3.org/1999/xhtml" class="footnote">1</span>
<span xmlns="http://www.w3.org/1999/xhtml" class="footnote">2</span>This is some more text.

Why does it include the text after the element until the end of the parent element?

I'm trying to use lxml to link footnotes and when I a.insert() the span element into the a element I create for it, it's including the text after and so linking large amounts of text I don't want linked.

falsetru · Accepted Answer

Specifying with_tail=False will remove the tail text.

print(etree.tostring(element, encoding="unicode", pretty_print=True, with_tail=False))

See lxml.etree.tostring documentation.

Lennart Regebro · Answer

It includes the text after the element, because that text belongs to the element.

If you don't want that text to belong to the previous span, it needs to be contained in it's own element. However, you can avoid printing this text when converting the element back to XML with with_tail=False as a parameter to etree.tostring().

You can also simply set the elements tail to '' if you want to remove it from a specific element.

Why does this element in lxml include the tail?

Tags:

python

html

lxml

jorbas

2 Answers

falsetru

Lennart Regebro

Recent Activity

Donate For Us

Why does this element in lxml include the tail?

Tags:

python

html

lxml

jorbas

2 Answers

falsetru

Lennart Regebro

Related questions

Recent Activity

Donate For Us