Suppose I have this sort of HTML from which I need to select "text2" using lxml / ElementTree:
<div>text1<span>childtext1</span>text2<span>childtext2</span>text3</div>
If I already have the div element as mydiv, then mydiv.text returns just "text1".
Using itertext() seems problematic or cumbersome at best since it walks the entire tree under the div.
Is there any simple/elegant way to extract a non-first text chunk from an element?
Well, lxml.etree provides full XPath support, which allows you to address the text items:
>>> import lxml.etree
>>> fragment = '<div>text1<span>childtext1</span>text2<span>childtext2</span>text3</div>'
>>> div = lxml.etree.fromstring(fragment)
>>> div.xpath('./text()')
['text1', 'text2', 'text3']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With