Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath: Find HTML element by *plain* text

Please note: A more refined version of this question, with an appropriate answer can be found here.

I would like to use the Selenium Python bindings to find elements with a given text on a web page. For example, suppose I have the following HTML:

<html>
    <head>...</head>
    <body>
        <someElement>This can be found</someElement>
        <someOtherElement>This can <em>not</em> be found</someOtherElement>
    </body>
</html>

I need to search by text and am able to find <someElement> using the following XPath:

//*[contains(text(), 'This can be found')]

I am looking for a similar XPath that lets me find <someOtherElement> using the plain text "This can not be found". The following does not work:

//*[contains(text(), 'This can not be found')]

I understand that this is because of the nested em element that "disrupts" the text flow of "This can not be found". Is it possible via XPaths to, in a way, ignore such or similar nestings as the one above?

like image 227
Michael Herrmann Avatar asked Sep 06 '13 10:09

Michael Herrmann


1 Answers

You can use //*[contains(., 'This can not be found')].

The context node . will be converted to its string representation before comparison to 'This can not be found'.

Be careful though since you are using //*, so it will match ALL englobing elements that contain this string.

In your example case, it will match:

  • <someOtherElement>
  • and <body>
  • and <html>!

You could restrict this by targeting specific element tags or specific section in your document (a <table> or <div> with a known id or class)


Edit for the OP's question in comment on how to find the most nested elements matching the text condition:

The accepted answer here suggests //*[count(ancestor::*) = max(//*/count(ancestor::*))] to select the most nested element. I think it's only XPath 2.0.

When combined with your substring condition, I was able to test it here with this document

<html>
<head>...</head>
<body>
    <someElement>This can be found</someElement>
    <nested>
        <someOtherElement>This can <em>not</em> be found most nested</someOtherElement>
    </nested>
    <someOtherElement>This can <em>not</em> be found</someOtherElement>
</body>
</html>

and with this XPath 2.0 expression

//*[contains(., 'This can not be found')]
   [count(ancestor::*) = max(//*/count(./*[contains(., 'This can not be found')]/ancestor::*))]

And it matches the element containing "This can not be found most nested".

There probably is a more elegant way to do that.

like image 71
paul trmbrth Avatar answered Oct 17 '22 21:10

paul trmbrth