Please note: A more refined version of this question, with an appropriate answer can be found here.
I would like to use the Selenium Python bindings to find elements with a given text on a web page. For example, suppose I have the following HTML:
<html>
    <head>...</head>
    <body>
        <someElement>This can be found</someElement>
        <someOtherElement>This can <em>not</em> be found</someOtherElement>
    </body>
</html>
I need to search by text and am able to find <someElement> using the following XPath:
//*[contains(text(), 'This can be found')]
I am looking for a similar XPath that lets me find <someOtherElement> using the plain text "This can not be found". The following does not work:
//*[contains(text(), 'This can not be found')]
I understand that this is because of the nested em element that "disrupts" the text flow of "This can not be found". Is it possible via XPaths to, in a way, ignore such or similar nestings as the one above?
You can use //*[contains(., 'This can not be found')].
The context node . will be converted to its string representation before comparison to 'This can not be found'.
Be careful though since you are using //*, so it will match ALL englobing elements that contain this string.
In your example case, it will match:
<someOtherElement><body>
<html>!You could restrict this by targeting specific element tags or specific section in your document (a <table> or <div> with a known id or class)
Edit for the OP's question in comment on how to find the most nested elements matching the text condition:
The accepted answer here suggests //*[count(ancestor::*) = max(//*/count(ancestor::*))] to select the most nested element. I think it's only XPath 2.0.
When combined with your substring condition, I was able to test it here with this document
<html>
<head>...</head>
<body>
    <someElement>This can be found</someElement>
    <nested>
        <someOtherElement>This can <em>not</em> be found most nested</someOtherElement>
    </nested>
    <someOtherElement>This can <em>not</em> be found</someOtherElement>
</body>
</html>
and with this XPath 2.0 expression
//*[contains(., 'This can not be found')]
   [count(ancestor::*) = max(//*/count(./*[contains(., 'This can not be found')]/ancestor::*))]
And it matches the element containing "This can not be found most nested".
There probably is a more elegant way to do that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With