This is such a basic question that I actually can't find it in the docs :-/
In the following:
img = house_tree.xpath('//img[@id="mainphoto"]')[0]
How do I get the HTML of the <img/>
tag?
I've tried adding html_content()
but get AttributeError: 'lxml.etree._Element' object has no attribute 'html_content'
.
Also, it was a tag with some content inside (e.g. <p>text</p>
) how would I get the content (e.g. text
)?
Many thanks!
lxml. etree supports parsing XML in a number of ways and from all important sources, namely strings, files, URLs (http/ftp) and file-like objects. The main parse functions are fromstring() and parse(), both called with the source as first argument.
lxml provides a very simple and powerful API for parsing XML and HTML. It supports one-step parsing as well as step-by-step parsing using an event-driven API (currently only for XML). Contents. Parsers. Parser options.
lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping. There are a lot of off-the-shelf XML parsers out there, but for better results, developers sometimes prefer to write their own XML and HTML parsers.
I suppose it will be as simple as:
from lxml.etree import tostring inner_html = tostring(img)
As for getting content from inside <p>
, say, some selected element el
:
content = el.text_content()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With