Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get text content of an HTML element using XPath?

See this html

<div>     <p>     <span class="abc">Monitor</span> <b>$300</b>     </p>     <a href="/add">Add to cart</a> </div> <div>     <p>     <span class="abc">Keyboard</span> $20      </p>     <a href="/add">Add to cart</a> </div> 

Using xpath I want to parse Monitor $300 and Keyboard $20. I use this xpath

 //div[a[contains(., "Add to cart")]]/p/text() 

But it selects <span class="abc">Monitor</span> <b>$300</b>. I don't want the tags. How do I get only the text?

like image 538
Genghis Khan Avatar asked Jan 31 '13 17:01

Genghis Khan


People also ask

How do I get the text of an element using XPath?

So, inorder to find the Text all you need to do is: driver. findElement(By. xpath("//*[contains(text(),'the text you are searching for')]"));

Can I use XPath on HTML?

Note that HTML and XML have a very similar structure, which is why XPath can be used almost interchangeably to navigate both HTML and XML documents.

What is text () in XPath?

XPath text() function is a built-in function of the Selenium web driver that locates items based on their text. It aids in the identification of certain text elements as well as the location of those components within a set of text nodes. The elements that need to be found should be in string format.


1 Answers

You want to select all descendant text, not just child text:

//div[a[contains(., "Add to cart")]]/p//text() 

Note the double slash between p and text() there.

This potentially will also include a lot of inter-tag whitespace though, you you'll need to clean that up. Example using lxml:

>>> import lxml.etree as ET >>> tree = ET.fromstring('''<div> ... <div> ...     <p> ...     <span class="abc">Monitor</span> <b>$300</b> ...     </p> ...     <a href="/add">Add to cart</a> ... </div> ... <div> ...     <p> ...     <span class="abc">Keyboard</span> $20  ...     </p> ...     <a href="/add">Add to cart</a> ... </div> ... </div>''') >>> tree.xpath('//div[a[contains(., "Add to cart")]]/p//text()') ['\n    ', 'Monitor', ' ', '$300', '\n    ', '\n    ', 'Keyboard', ' $20 \n    '] >>> res = _ >>> [txt for txt in (txt.strip() for txt in res) if txt] ['Monitor', '$300', 'Keyboard', '$20'] 
like image 130
Martijn Pieters Avatar answered Sep 22 '22 19:09

Martijn Pieters