I am using Nokogiri with Ruby to interpret the contents of an XML file. I would like to get an array (or similar) of all elements that are direct children of <where>
in my example. However, I am getting various text nodes (e.g. "\n\t\t\t"
), which I do not want. Is there any way I can remove or ignore them?
@body = "
<xml>
<request>
<where>
<username compare='e'>Admin</username>
<rank compare='gt'>5</rank>
</where>
</request>
</xml>" #in my code, the XML contains tab-indentation, rather than spaces. It is edited here for display purposes.
@noko = Nokogiri::XML(@body)
xml_request = @noko.xpath("//xml/request")
where = xml_request.xpath("where")
c = where.children
p c
The above Ruby script outputs:
[#<Nokogiri::XML::Text:0x100344c "\n\t\t\t">, #<Nokogiri::XML::Element:0x1003350 name="username" attributes=[#<Nokogiri::XML::Attr:0x10032fc name="compare" value="e">] children=[#<Nokogiri::XML::Text:0x1007580 "Admin">]>, #<Nokogiri::XML::Text:0x100734c "\n\t\t\t">, #<Nokogiri::XML::Element:0x100722c name="rank" attributes=[#<Nokogiri::XML::Attr:0x10071d8 name="compare" value="gt">] children=[#<Nokogiri::XML::Text:0x1006cec "5">]>, #<Nokogiri::XML::Text:0x10068a8 "\n\t\t">]
I would like to somehow obtain the following object:
[#<Nokogiri::XML::Element:0x1003350 name="username" attributes=[#<Nokogiri::XML::Attr:0x10032fc name="compare" value="e">] children=[#<Nokogiri::XML::Text:0x1007580 "Admin">]>, #Nokogiri::XML::Element:0x100722c name="rank" attributes=[#<Nokogiri::XML::Attr:0x10071d8 name="compare" value="gt">] children=[#<Nokogiri::XML::Text:0x1006cec "5">]>]
Currently I can work around the issue using
c.each{|child|
if !child.text?
...
end
}
but c.length == 5
. It would make my life easier if someone can suggest how to exclude direct child text nodes from c, so that c.length == 2
Text nodes cannot have child nodes because they represent content, not structure. Text nodes must be contained by element, attribute, document fragment, or entity reference nodes—they cannot be contained by the top-level document node, though the DOMDocument object is used to create text nodes.
To get all child nodes, including non-element nodes like text and comment nodes, use Node. childNodes .
You have (at least) three options from which to choose:
Use c = where.element_children
instead of c = where.children
.
Select only the child elements directly:c = xml_request.xpath('./where/*')
orc = where.xpath('./*')
Filter the list of children to only those that are elements:c = where.children.select(&:element?)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With