Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Only select text directly in node, not in child nodes

Tags:

xpath

xquery

How does one retrieve the text in a node without selecting the text in the children?

<div id="comment">
     <div class="title">Editor's Description</div>
     <div class="changed">Last updated: </div>
     <br class="clear">
     Lorem ipsum dolor sit amet.
</div>

In other words, I want Lorem ipsum dolor sit amet. rather than Editor's DescriptionLast updated: Lorem ipsum dolor sit amet.

like image 909
Moak Avatar asked Dec 19 '10 16:12

Moak


People also ask

What is the difference between children and child nodes?

The main difference between children and childNodes property is that children work upon elements and childNodes on nodes including non-element nodes like text and comment nodes.

Can a text node have child nodes?

No. Elements may contain attributes, other elements, or text.

What is text () in XPath?

XPath text() function is a built-in function of the Selenium web driver that locates items based on their text. It aids in the identification of certain text elements as well as the location of those components within a set of text nodes. The elements that need to be found should be in string format.


3 Answers

In the provided XML document:

<div id="comment">
      <div class="title">Editor's Description</div>
      <div class="changed">Last updated: </div>
      <br class="clear">
      Lorem ipsum dolor sit amet. 
</div> 

the top element /div has 4 children nodes that are text nodes. The first three of these four text-node children are whitespace-only. The last of these 4 text-node children is the one that is wanted.

Use:

/div/text()[last()]

This is different from:

/div/text()

The latter may (depending on whether whitespace-only nodes are preserved by the XML parser) select all 4 text nodes, but you only want the last of them.

An alternative is (when you don't know exactly which text-node you want):

/div/text()[normalize-space()]

This selects all text-node-children of /div that are not whitespace-only text nodes.

like image 128
Dimitre Novatchev Avatar answered Oct 23 '22 17:10

Dimitre Novatchev


Just select text() instead of .:

div/text()

On the given XML fragment, this returns:

Lorem ipsum dolor sit amet.
like image 22
Lucero Avatar answered Oct 23 '22 18:10

Lucero


How about this :
$doc/node()[3]/text()
Assuming $doc has the xml.

like image 23
bosari Avatar answered Oct 23 '22 16:10

bosari