Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get node text without children?

Tags:

I use Nokogiri for parse the html page with same content:

<p class="parent">
  Useful text
  <br>
  <span class="child">Useless text</span>
</p>

When I call the method page.css('p.parent').text Nokogiri returns 'Useful text Useless text'. But I need only 'Useful text'.

How to get node text without children?

like image 281
Denis Kreshikhin Avatar asked Aug 27 '13 16:08

Denis Kreshikhin


People also ask

How do I get nodes in text?

The textNodes of any element can be selected using jQuery by selecting all the nodes and using the filter() method to check the nodeType property. The required element is first selected using the jQuery selector. The contents() method is used on selected elements.

How do you get all child nodes?

To get all child nodes of an element, you can use the childNodes property. This property returns a collection of a node's child nodes, as a NodeList object. By default, the nodes in the collection are sorted by their appearance in the source code. You can use a numerical index (start from 0) to access individual nodes.

What is a text node?

A text node encapsulates XML character content. A text node can have zero or one parent. The content of a text node can be empty. However, unless the parent of a text node is empty, the content of the text node cannot be an empty string.


1 Answers

XPath includes the text() node test for selecting text nodes, so you could do:

page.xpath('//p[@class="parent"]/text()')

Using XPath to select HTML classes can become quite tricky if the element in question could belong to more than one class, so this might not be ideal.

Fortunately Nokogiri adds the text() selector to CSS, so you can use:

page.css('p.parent > text()')

to get the text nodes that are direct children of p.parent. This will also return some nodes that are whtespace only, so you may have to filter them out.

like image 197
matt Avatar answered Nov 14 '22 17:11

matt