Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath to select only child elements (not blank text nodes)

I am parsing some XML using Nokogiri and XPath. When I do this:

doc.xpath('//Order/child::node()').each do |node|
  puts node.name
end

It prints out all the nodes, but also in between names, it prints out 'text'. I think I know why:

In my xml, there's spaces between nodes like this : "<a1>hi</a1> \n <a2>bye</a2>"

Is there a way I can tell it to ignore stuff between nodes?

like image 945
0xSina Avatar asked Jan 17 '12 03:01

0xSina


2 Answers

Use:

//Order/node()[not(self::text()[not(normalize-space())])]

this selects all children nodes of any Order element, except the ones that are text nodes consisting entirely of white-space.

XSLT - based verification:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>


 <xsl:template match="/*">
     <xsl:variable name="vSel1" select="//Order/node()"/>
     <xsl:variable name="vSel2" select=
     "//Order/node()[not(self::text()[not(normalize-space())])]"/>

     <xsl:for-each select="$vSel1">
       <xsl:value-of select="concat('&#xA;',position(), ': ')"/>
       <xsl:copy-of select="."/>
       <xsl:text>&#xA;</xsl:text>
     </xsl:for-each>
================
     <xsl:for-each select="$vSel2">
       <xsl:value-of select="concat('&#xA;',position(), ': ')"/>
       <xsl:copy-of select="."/>
       <xsl:text>&#xA;</xsl:text>
     </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the following XML document:

<t>
 <Order>
  <a/>
  <b>xxx</b>
  <c/>
 </Order>
 <Order>
  <d/>
  <e>xxx</e>
  <f/>
 </Order>
</t>

the two XPath expressions are evaluated and the nodes of two corresponding sets of selected nodes are output, each preceded by its position number:

1: 


2: <a/>

3: 


4: <b>xxx</b>

5: 


6: <c/>

7: 


8: 


9: <d/>

10: 


11: <e>xxx</e>

12: 


13: <f/>

14: 


================

1: <a/>

2: <b>xxx</b>

3: <c/>

4: <d/>

5: <e>xxx</e>

6: <f/>
like image 159
Dimitre Novatchev Avatar answered Sep 27 '22 20:09

Dimitre Novatchev


If you only want the elements, use a better XPath: asking for /* will find you all the child elements:

require 'nokogiri'
doc = Nokogiri.XML("<r><a>1</a>\n\t<b>2</b></r>")
p doc.xpath('/r/child::node()').map(&:name)
#=> ["a", "text", "b"]

p doc.xpath('/r/*').map(&:name)
#=> ["a", "b"]

Alternatively, you can ask Nokogiri to throw away any text notes that are only whitespace:

doc2 = Nokogiri.XML("<r><a>1</a>\n\t<b>2</b></r>",&:noblanks)
p doc2.xpath('/r/child::node()').map(&:name)
#=> ["a", "b"]

Or, you can use Ruby to filter your NodeSet further based on arbitrary criteria:

mine = doc.xpath('/r/child::node()').select do |node| 
  node.type != Nokogiri::XML::Node::TEXT_NODE || node.content =~ /\S/
end
p mine.map(&:name)
#=> ["a", "b"]
like image 28
Phrogz Avatar answered Sep 27 '22 22:09

Phrogz