Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Using xpath locally / on a specific element

Tags:

python

xpath

lxml

I'm trying to get the links from a page with xpath. The problem is that I only want the links inside a table, but if I apply the xpath expression on the whole page I'll capture links which I don't want.

For example:

tree = lxml.html.parse(some_response) links = tree.xpath("//a[contains(@href, 'http://www.example.com/filter/')]") 

The problem is that applies the expression to the whole document. I located the element I want, for example:

tree = lxml.html.parse(some_response) root = tree.getroot() table = root[1][5] #for example links = table.xpath("//a[contains(@href, 'http://www.example.com/filter/')]") 

But that seems to be performing the query in the whole document as well, as I still am capturing the links outside of the table. This page says that "When xpath() is used on an Element, the XPath expression is evaluated against the element (if relative) or against the root tree (if absolute):". So, what I using is an absolute expression and I need to make it relative? Is that it?

Basically, how can I go about filtering only elements that exist inside of this table?

like image 519
pvt pns Avatar asked Jan 24 '11 18:01

pvt pns


People also ask

How do you select an element by XPath?

Go to the First name tab and right click >> Inspect. On inspecting the web element, it will show an input tag and attributes like class and id. Use the id and these attributes to construct XPath which, in turn, will locate the first name field.

How do I find the XPath of an element in Python?

To find the XPath for a particular element on a page:Right-click the element in the page and click on Inspect. Right click on the element in the Elements Tab. Click on copy XPath.

Can we use XPath in python?

XPath Standard Functions. XPath includes over 200 built-in functions. There are functions for string values, numeric values, booleans, date and time comparison, node manipulation, sequence manipulation, and much more. These expressions can be used in JavaScript, Java, XML Schema, Python, and lots of other languages.


1 Answers

Your xpath starts with a slash (/) and is therefore absolute. Add a dot (.) in front to make it relative to the current element i.e.

links = table.xpath(".//a[contains(@href, 'http://www.example.com/filter/')]") 
like image 115
phihag Avatar answered Sep 19 '22 02:09

phihag