See this html
<div> <p> <span class="abc">Monitor</span> <b>$300</b> </p> <a href="/add">Add to cart</a> </div> <div> <p> <span class="abc">Keyboard</span> $20 </p> <a href="/add">Add to cart</a> </div>
Using xpath I want to parse Monitor $300
and Keyboard $20
. I use this xpath
//div[a[contains(., "Add to cart")]]/p/text()
But it selects <span class="abc">Monitor</span> <b>$300</b>
. I don't want the tags. How do I get only the text?
So, inorder to find the Text all you need to do is: driver. findElement(By. xpath("//*[contains(text(),'the text you are searching for')]"));
Note that HTML and XML have a very similar structure, which is why XPath can be used almost interchangeably to navigate both HTML and XML documents.
XPath text() function is a built-in function of the Selenium web driver that locates items based on their text. It aids in the identification of certain text elements as well as the location of those components within a set of text nodes. The elements that need to be found should be in string format.
You want to select all descendant text, not just child text:
//div[a[contains(., "Add to cart")]]/p//text()
Note the double slash between p
and text()
there.
This potentially will also include a lot of inter-tag whitespace though, you you'll need to clean that up. Example using lxml
:
>>> import lxml.etree as ET >>> tree = ET.fromstring('''<div> ... <div> ... <p> ... <span class="abc">Monitor</span> <b>$300</b> ... </p> ... <a href="/add">Add to cart</a> ... </div> ... <div> ... <p> ... <span class="abc">Keyboard</span> $20 ... </p> ... <a href="/add">Add to cart</a> ... </div> ... </div>''') >>> tree.xpath('//div[a[contains(., "Add to cart")]]/p//text()') ['\n ', 'Monitor', ' ', '$300', '\n ', '\n ', 'Keyboard', ' $20 \n '] >>> res = _ >>> [txt for txt in (txt.strip() for txt in res) if txt] ['Monitor', '$300', 'Keyboard', '$20']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With