I'm completely stumped why lxml .text
will give me the text for a child tag but for the root tag.
some_tag = etree.fromstring('<some_tag class="abc"><strong>Hello</strong> World</some_tag>')
some_tag.find("strong")
Out[195]: <Element strong at 0x7427d00>
some_tag.find("strong").text
Out[196]: 'Hello'
some_tag
Out[197]: <Element some_tag at 0x7bee508>
some_tag.text
some_tag.find("strong").text
returns the text between the <strong>
tag.
I expect some_tag.text
to return everything between <some_tag> ... </some_tag>
Expected:
<strong>Hello</strong> World
Instead, it returns nothing.
lxml. etree supports parsing XML in a number of ways and from all important sources, namely strings, files, URLs (http/ftp) and file-like objects. The main parse functions are fromstring() and parse(), both called with the source as first argument.
lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping. There are a lot of off-the-shelf XML parsers out there, but for better results, developers sometimes prefer to write their own XML and HTML parsers.
lxml provides a very simple and powerful API for parsing XML and HTML. It supports one-step parsing as well as step-by-step parsing using an event-driven API (currently only for XML).
from lxml import etree
XML = '<some_tag class="abc"><strong>Hello</strong> World</some_tag>'
some_tag = etree.fromstring(XML)
for element in some_tag:
print element.tag, element.text, element.tail
Output:
strong Hello World
For information on the .text
and .tail
properties, see:
To get exactly the result that you expected, use
print etree.tostring(some_tag.find("strong"))
Output:
<strong>Hello</strong> World
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With