Nokogiri and finding element by name

Question

I am parsing an XML file using Nokogiri with the following snippet:

doc.xpath('//root').each do |root|
  puts "# ROOT found"
  root.xpath('//page').each do |page|
    puts "## PAGE found / #{page['id']} / #{page['name']} / #{page['width']} / #{page['height']}"
    page.children.each do |content|
      ...
    end
  end
end

How can I parse through all elements in the page element? There are three different elements: image, text and video. How can I make a case statement for each element?

noli · Accepted Answer

Honestly, you look pretty close to me..

doc.xpath('//root').each do |root|
  puts "# ROOT found"
  root.xpath('//page').each do |page|
    puts "## PAGE found / #{page['id']} / #{page['name']} / #{page['width']} / #{page['height']}"
    page.children.each do |child|
      case child.name
       when 'image'  
          do_image_stuff
       when 'text'
          do_text_stuff
       when 'video'
          do_video_stuff
       end
    end
  end
end

the Tin Man · Answer

Both Nokogiri's CSS and XPath accessors allow multiple tags to be specified, which can be useful for this sort of problem. Rather than walk through every tag in the document's page tag:

require 'nokogiri'

doc = Nokogiri::XML('
  <xml>
  <body>
  <image>image</image>
  <text>text</text>
  <video>video</video>
  <other>other</other>
  <image>image</image>
  <text>text</text>
  <video>video</video>
  <other>other</other>
  </body>
  </xml>')

This is a search using CSS:

doc.search('image, text, video').each do |node|
  case node.name
  when 'image'
    puts node.text
  when 'text'
    puts node.text
  when 'video'
    puts node.text
  else
    puts 'should never get here'
  end
end

# >> image
# >> image
# >> text
# >> text
# >> video
# >> video

Notice it returns the tags in the order that the CSS accessor specifies it. If you need the order of the tags in the document, you can use XPath:

doc.search('//image | //text | //video').each do |node|
  puts node.text
end

# >> image
# >> text
# >> video
# >> image
# >> text
# >> video

In either case, the program should run faster because all the searching occurs in libXML, returning only the nodes you need for Ruby's processing.

If you need to restrict the search to within a <page> tag you can do a search up front to find the page node, then search underneath it:

doc.at('page').search('image, text, video').each do |node|
  ...
end

or

doc.at('//page').search('//image | //text | //video').each do |node|
  ...
end

Nokogiri and finding element by name

Tags:

xml

xml-parsing

ruby

nokogiri

trnc

2 Answers

noli

the Tin Man

Recent Activity

Donate For Us

Nokogiri and finding element by name

Tags:

xml

xml-parsing

ruby

nokogiri

trnc

2 Answers

noli

the Tin Man

Related questions

Recent Activity

Donate For Us