Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get children of an element without the text nodes

I am using Nokogiri with Ruby to interpret the contents of an XML file. I would like to get an array (or similar) of all elements that are direct children of <where> in my example. However, I am getting various text nodes (e.g. "\n\t\t\t"), which I do not want. Is there any way I can remove or ignore them?

@body = "
<xml>
  <request>
    <where>
      <username compare='e'>Admin</username>
      <rank compare='gt'>5</rank>
    </where>
  </request>
</xml>" #in my code, the XML contains tab-indentation, rather than spaces. It is edited here for display purposes.

@noko = Nokogiri::XML(@body)
xml_request = @noko.xpath("//xml/request")
where = xml_request.xpath("where")
c = where.children
p c

The above Ruby script outputs:

[#<Nokogiri::XML::Text:0x100344c "\n\t\t\t">, #<Nokogiri::XML::Element:0x1003350 name="username" attributes=[#<Nokogiri::XML::Attr:0x10032fc name="compare" value="e">] children=[#<Nokogiri::XML::Text:0x1007580 "Admin">]>, #<Nokogiri::XML::Text:0x100734c "\n\t\t\t">, #<Nokogiri::XML::Element:0x100722c name="rank" attributes=[#<Nokogiri::XML::Attr:0x10071d8 name="compare" value="gt">] children=[#<Nokogiri::XML::Text:0x1006cec "5">]>, #<Nokogiri::XML::Text:0x10068a8 "\n\t\t">]

I would like to somehow obtain the following object:

[#<Nokogiri::XML::Element:0x1003350 name="username" attributes=[#<Nokogiri::XML::Attr:0x10032fc name="compare" value="e">] children=[#<Nokogiri::XML::Text:0x1007580 "Admin">]>, #Nokogiri::XML::Element:0x100722c name="rank" attributes=[#<Nokogiri::XML::Attr:0x10071d8 name="compare" value="gt">] children=[#<Nokogiri::XML::Text:0x1006cec "5">]>]

Currently I can work around the issue using

c.each{|child|
  if !child.text?
    ...
  end
}

but c.length == 5. It would make my life easier if someone can suggest how to exclude direct child text nodes from c, so that c.length == 2

like image 656
SimonMayer Avatar asked Feb 14 '12 23:02

SimonMayer


People also ask

Can text nodes have children nodes?

Text nodes cannot have child nodes because they represent content, not structure. Text nodes must be contained by element, attribute, document fragment, or entity reference nodes—they cannot be contained by the top-level document node, though the DOMDocument object is used to create text nodes.

How do you get children of node?

To get all child nodes, including non-element nodes like text and comment nodes, use Node. childNodes .


1 Answers

You have (at least) three options from which to choose:

  1. Use c = where.element_children instead of c = where.children.

  2. Select only the child elements directly:
    c = xml_request.xpath('./where/*') or
    c = where.xpath('./*')

  3. Filter the list of children to only those that are elements:
    c = where.children.select(&:element?)

like image 180
Phrogz Avatar answered Nov 04 '22 02:11

Phrogz