Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering out content with style display:none in an XPath expression

Tags:

xpath

I'm trying to parse with lxml in python and this is my output

<td>
    <span style="display:inline">text1</span>
    <span style="display:none">text2</span>
    <span>text3</span>
    text4
</td>

Thought I was smart enough to use the following

tree = tr.xpath("//*[contains(@style,'inline')]/text()")

But then I thought I would only see text1. What I want is to see text3 and text4 too so that the output will be

['text1', 'text3', 'text4']

Can anyone send me to the right direction of doing it?

like image 209
Clubmate Avatar asked Jun 05 '12 15:06

Clubmate


People also ask

What does /* mean in xpath?

/* selects the root element, regardless of name. ./* or * selects all child elements of the context node, regardless of name.

What does text () do in xpath?

The XPath text() function is a built-in function of selenium webdriver which is used to locate elements based on text of a web element. It helps to find the exact text elements and it locates the elements within the set of text nodes. The elements to be located should be in string form.

How do you simplify xpath?

There is no one step solution to shorten or simplify a xpath. The real challenge is to construct relative xpath i.e. in other words, convert absolute xpath into relative xpath.


1 Answers

Explicitly exclude anything with display:none:

tree = tr.xpath("//*[not(contains(@style,'display:none'))]/text()")

That said -- this is only a distant approximation of what a browser would actually do; you'd want to be driving an actual browser (as with Selenium, embedding APIs, or the like) if you required strictly accurate results.

like image 190
Charles Duffy Avatar answered Sep 20 '22 15:09

Charles Duffy