Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get the inner HTML of a element in lxml

Tags:

python

xpath

lxml

I am trying to get the HTML content of child node with lxml and xpath in Python. As shown in code below, I want to find the html content of the each of product nodes. Does it have any methods like product.html?

productGrids = tree.xpath("//div[@class='name']/parent::*") for product in productGrids:     print #html content of product 
like image 537
Sudip Kafle Avatar asked Feb 15 '13 13:02

Sudip Kafle


People also ask

Can lxml parse HTML?

lxml provides a very simple and powerful API for parsing XML and HTML. It supports one-step parsing as well as step-by-step parsing using an event-driven API (currently only for XML).

What does HTML Fromstring do?

fromstring . This provides us with an object of HtmlElement type. This object has the xpath method which we can use to query the HTML document. This provides us with a structured way to extract information from an HTML document.

What is Xpath in lxml?

lxml. etree supports the simple path syntax of the find, findall and findtext methods on ElementTree and Element, as known from the original ElementTree library (ElementPath).


2 Answers

from lxml import etree print(etree.tostring(root, pretty_print=True)) 

you may see more examples here: http://lxml.de/tutorial.html

like image 137
Walty Yeung Avatar answered Oct 07 '22 13:10

Walty Yeung


I believe you want to use the tostring() method:

from lxml import etree  tree = etree.fromstring('<html><head><title>foo</title></head><body><div class="name"><p>foo</p></div><div class="name"><ul><li>bar</li></ul></div></body></html>') for elem in tree.xpath("//div[@class='name']"):      # pretty_print ensures that it is nicely formatted.      print etree.tostring(elem, pretty_print=True) 
like image 35
vezult Avatar answered Oct 07 '22 12:10

vezult