Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath select innertext

Tags:

html

c#

text

xpath

I have this HTML/XML:

\t\t\t\t\t    \r\n\t\t
<a href="/test.aspx">
  <span class=test>
    <b>blabla</b>
  </span>
</a>
<br/>
this is the text I want
<br/>
<span class="test">
  <b>code: 123</b>
</span>
<br/>
<span class="test"></span>
\t\t\t\t\t\t\t\t\t\t\t\t\r\n\t\t\t

In C#4 I use the HtmlAgilityPack lib to select the Node with XPath and get the InnerText property. This will get all the text inside the node. How can I get only the text "this is the text I want"?

/text() only returns \t\t\t\t\t \r\n\t\t

like image 461
peter Avatar asked Oct 06 '10 13:10

peter


People also ask

What is text () in XPath?

XPath text() function is a built-in function of the Selenium web driver that locates items based on their text. It aids in the identification of certain text elements as well as the location of those components within a set of text nodes. The elements that need to be found should be in string format.

How do I get XPath text?

The XPath text() function is a built-in function of selenium webdriver which is used to locate elements based on text of a web element. It helps to find the exact text elements and it locates the elements within the set of text nodes. The elements to be located should be in string form.

How use contains XPath?

The syntax for locating elements through XPath- Using contains() method can be written as: //<HTML tag>[contains(@attribute_name,'attribute_value')]

Can we use and in XPath?

Using OR & AND In the below XPath expression, it identifies the elements whose single or both conditions are true. Highlight both elements as 'First Name' element having attribute 'id' and 'Last Name' element having attribute 'name'. In AND expression, two conditions are used.


1 Answers

/div/text()

From the example given, this XPath will get you all text nodes underneath the div element, in this case test2.

If you could elaborate more on the question we might better be able to help you. The Div contains 3 children: a span element, a text node and a b element. The span and b each have a text node child. Using XPath you could select elements only (/div/*), text nodes only (/div/text()) or all node types (/div/node()).

EDIT: /text() will only return you root level text nodes. In this case I would expect it to return a node list containing 3 text nodes:

\t\t\t\t\t    \r\n\t\t 
this is the text I want
\t\t\t\t\t\t\t\t\t\t\t\t\r\n\t\t\t

Are you perhaps only selecting the first node in the resultant node list? There are a few issues of well-formedness such as your <br> should probably be <br/>.

like image 131
Chris Cameron-Mills Avatar answered Sep 22 '22 00:09

Chris Cameron-Mills