I am parsing an XML file using Nokogiri with the following snippet:
doc.xpath('//root').each do |root|
puts "# ROOT found"
root.xpath('//page').each do |page|
puts "## PAGE found / #{page['id']} / #{page['name']} / #{page['width']} / #{page['height']}"
page.children.each do |content|
...
end
end
end
How can I parse through all elements in the page element? There are three different elements: image, text and video. How can I make a case statement for each element?
Honestly, you look pretty close to me..
doc.xpath('//root').each do |root|
puts "# ROOT found"
root.xpath('//page').each do |page|
puts "## PAGE found / #{page['id']} / #{page['name']} / #{page['width']} / #{page['height']}"
page.children.each do |child|
case child.name
when 'image'
do_image_stuff
when 'text'
do_text_stuff
when 'video'
do_video_stuff
end
end
end
end
Both Nokogiri's CSS and XPath accessors allow multiple tags to be specified, which can be useful for this sort of problem. Rather than walk through every tag in the document's page
tag:
require 'nokogiri'
doc = Nokogiri::XML('
<xml>
<body>
<image>image</image>
<text>text</text>
<video>video</video>
<other>other</other>
<image>image</image>
<text>text</text>
<video>video</video>
<other>other</other>
</body>
</xml>')
This is a search using CSS:
doc.search('image, text, video').each do |node|
case node.name
when 'image'
puts node.text
when 'text'
puts node.text
when 'video'
puts node.text
else
puts 'should never get here'
end
end
# >> image
# >> image
# >> text
# >> text
# >> video
# >> video
Notice it returns the tags in the order that the CSS accessor specifies it. If you need the order of the tags in the document, you can use XPath:
doc.search('//image | //text | //video').each do |node|
puts node.text
end
# >> image
# >> text
# >> video
# >> image
# >> text
# >> video
In either case, the program should run faster because all the searching occurs in libXML, returning only the nodes you need for Ruby's processing.
If you need to restrict the search to within a <page>
tag you can do a search up front to find the page
node, then search underneath it:
doc.at('page').search('image, text, video').each do |node|
...
end
or
doc.at('//page').search('//image | //text | //video').each do |node|
...
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With