I am trying to get the HTML content of child node with lxml and xpath in Python. As shown in code below, I want to find the html content of the each of product nodes. Does it have any methods like product.html?
productGrids = tree.xpath("//div[@class='name']/parent::*") for product in productGrids: print #html content of product
lxml provides a very simple and powerful API for parsing XML and HTML. It supports one-step parsing as well as step-by-step parsing using an event-driven API (currently only for XML).
fromstring . This provides us with an object of HtmlElement type. This object has the xpath method which we can use to query the HTML document. This provides us with a structured way to extract information from an HTML document.
lxml. etree supports the simple path syntax of the find, findall and findtext methods on ElementTree and Element, as known from the original ElementTree library (ElementPath).
from lxml import etree print(etree.tostring(root, pretty_print=True))
you may see more examples here: http://lxml.de/tutorial.html
I believe you want to use the tostring()
method:
from lxml import etree tree = etree.fromstring('<html><head><title>foo</title></head><body><div class="name"><p>foo</p></div><div class="name"><ul><li>bar</li></ul></div></body></html>') for elem in tree.xpath("//div[@class='name']"): # pretty_print ensures that it is nicely formatted. print etree.tostring(elem, pretty_print=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With