Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath: Find HTML element by plain text

Tags:

html

xpath

Please note: This question is a more refined version of a previous question.

I am looking for an XPath that lets me find elements with a given plain text in an HTML document. For example, suppose I have the following HTML:

<html>
<head>...</head>
<body>
    <someElement>This can be found</someElement>
    <nested>
        <someOtherElement>This can <em>not</em> be found most nested</someOtherElement>
    </nested>
    <yetAnotherElement>This can <em>not</em> be found</yetAnotherElement>
</body>
</html>

I need to search by text and am able to find <someElement> using the following XPath:

//*[contains(text(), 'This can be found')]

I am looking for a similar XPath that lets me find <someOtherElement> and <yetAnotherElement> using the plain text "This can not be found". The following does not work:

//*[contains(text(), 'This can not be found')]

I understand that this is because of the nested em element that "disrupts" the text flow of "This can not be found". Is it possible via XPaths to, in a way, ignore such or similar nestings as the one above?

like image 387
Michael Herrmann Avatar asked Sep 09 '13 17:09

Michael Herrmann


People also ask

How do I find the XPath of an element in HTML?

You can press F12 to get the check page, and select your hoped html code, hit the mouse 2, it has the copy option, there is a copy Xpath.

How do I search for text in XPath?

So, inorder to find the Text all you need to do is: driver. findElement(By. xpath("//*[contains(text(),'the text you are searching for')]"));

Can you use XPath for HTML?

XML and HTML Note that HTML and XML have a very similar structure, which is why XPath can be used almost interchangeably to navigate both HTML and XML documents.

What is text () function in XPath?

XPath text() function is a built-in function of the Selenium web driver that locates items based on their text. It aids in the identification of certain text elements as well as the location of those components within a set of text nodes. The elements that need to be found should be in string format.


1 Answers

You can use

//*[contains(., 'This can not be found')]
   [not(.//*[contains(., 'This can not be found')])]

This XPath consists of two parts:

  1. //*[contains(., 'This can not be found')]: The operator . converts the context node to its string representation. This part therefore selects all nodes that contain 'This can not be found' in their string representation. In the above example, this is <someOtherElement>, <yetAnotherElement> and: <body> and <html>.
  2. [not(.//*[contains(., 'This can not be found')])]: This removes nodes with a child element that still contains the plain text 'This can not be found'. It removes the unwanted nodes <body> and <html> in the above example.

You can try these XPaths out here.

like image 121
Michael Herrmann Avatar answered Sep 28 '22 19:09

Michael Herrmann