Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get path of an element in lxml?

Tags:

python

xpath

lxml

I'm searching in a HTML document using XPath from lxml in python. How can I get the path to a certain element? Here's the example from ruby nokogiri:

page.xpath('//text()').each do |textnode|     path = textnode.path     puts path end 

print for example '/html/body/div/div[1]/div[1]/p/text()[1]' and this is the string I want to get in python.

like image 842
Fluffy Avatar asked Oct 16 '09 10:10

Fluffy


People also ask

What is XPath in lxml?

lxml. etree supports the simple path syntax of the find, findall and findtext methods on ElementTree and Element, as known from the original ElementTree library (ElementPath).

What is Etree in lxml?

lxml. etree supports parsing XML in a number of ways and from all important sources, namely strings, files, URLs (http/ftp) and file-like objects. The main parse functions are fromstring() and parse(), both called with the source as first argument.

Is XML and lxml are same?

lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping. There are a lot of off-the-shelf XML parsers out there, but for better results, developers sometimes prefer to write their own XML and HTML parsers.


1 Answers

Use getpath from ElementTree objects.

from lxml import etree      root = etree.fromstring('''     <foo><bar>Data</bar><bar><baz>data</baz>     <baz>data</baz></bar></foo>     ''')      tree = etree.ElementTree(root) for e in root.iter():     print(tree.getpath(e)) 

Prints

/foo /foo/bar[1] /foo/bar[2] /foo/bar[2]/baz[1] /foo/bar[2]/baz[2] 
like image 117
nosklo Avatar answered Sep 22 '22 01:09

nosklo