Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to iterate XML nested elements with Nokogiri in Ruby

I'm trying to iterate over a folder structure in XML with Nokogiri but I'm stuck in this:

<test>
   <folder name="Folder A">
      <folder name="Folder A1">
         <file name="a.txt">Cool file</file>
      </folder>
      <folder name="Folder A2"></folder>
   </folder>
   <folder name="Folder B">
      <folder name="Folder B1"></folder>
      <folder name="Folder B2">
         <folder name="Folder B21">
            <file name="b.txt"></file>
         </folder>
   </folder>
</test>

So, I want to iterate over this in order to be able to create a tree of folders and files (folders A1 and A2 are inside folder A, folders B1 and B2 are inside folder B, and folder B21 is inside folder B2).

So I'm doing this:

nodes = allnodes.xpath('//folder')
nodes.each do |node|
  puts "name => #{node.attributes['name']}"
end

but this lists me all the folders (A, A1, A2, B, B1, B2, B21). How can I make it so that I don't check inside the previous folders for more folders, and I then send it to the same recursive function?

Thanks a lot for the help :)

like image 805
Tiago Avatar asked Feb 19 '14 16:02

Tiago


2 Answers

I'd do as below :

require 'nokogiri'

doc = Nokogiri::XML(<<-xml)
<test>
   <folder name="Folder A">
      <folder name="Folder A1">
         <file name="a.txt">Cool file</file>
      </folder>
      <folder name="Folder A2"></folder>
   </folder>
   <folder name="Folder B">
      <folder name="Folder B1"></folder>
      <folder name="Folder B2">
         <folder name="Folder B21">
            <file name="b.txt"></file>
         </folder>
   </folder>
</test>
xml

# Here I am collecting all folders, which has at-least one child.
parent_folders = doc.xpath("//folder").select do|folder_node|
  folder_node.xpath("./folder").size > 0
end

# Here I will iterate each parent directory, and would collect the corresponding
# sub-directories names.
parent_directory = parent_folders.each_with_object({}) do |parent_dir,dir_hash|
  dir_hash[parent_dir['name']] = parent_dir.xpath("./folder").collect do |sub_dir|
    sub_dir['name']
  end
end

parent_directory
# => {"Folder A"=>["Folder A1", "Folder A2"],
#     "Folder B"=>["Folder B1", "Folder B2", "Folder B21"],
#     "Folder B2"=>["Folder B21"]}

Now, you have a hash parent_directory, which maintains all the directory(key)/sub-directories(value) relationship. Now using Hash#[] method, you can easily extract the sub-directories, of a given directory. One example -

parent_directory['Folder A'] # => ["Folder A1", "Folder A2"]
like image 27
Arup Rakshit Avatar answered Nov 14 '22 14:11

Arup Rakshit


When you use an XPath with //foo you find foo elements at any level. If you instead use ./foo or just foo then you will only find child elements. Thus:

# Given an XML node, yields the node and all <file> children
# Then recursively does the same with every <folder> child
def process_files_and_folders(node,&blk)   
  yield node, node.xpath('file')
  node.xpath('folder').each{ |folder| process_files_and_folders(folder,&blk) }
end

The keys to this are (a) recursion (having the method call itself for all the child folders) and (b) capturing the block passed by the user with the &blk notation, and then passing that block along to the later calls.

Seen in action:

require 'nokogiri'
doc = Nokogiri.XML(my_xml)
process_files_and_folders( doc.root ) do |folder,files|
  depth  = folder.ancestors.length-1  # Just for pretty text output indenting
  indent = "  "*depth                 # Just for pretty text output indenting
  if folder['name']
    puts "#{indent}Processing the folder named #{folder['name']}"
  else
    puts "#{indent}No folder name; probably the root element."
  end
  unless files.empty?
    puts "#{indent}There are #{files.length} files in '#{folder['name']}':"
    files.each{ |file| print indent, file['name'], "\n" }
  end
end

Result:

No folder name; probably the root element.
  Processing the folder named Folder A
    Processing the folder named Folder A1
    There are 1 files in 'Folder A1':
    a.txt
    Processing the folder named Folder A2
  Processing the folder named Folder B
    Processing the folder named Folder B1
    Processing the folder named Folder B2
      Processing the folder named Folder B21
      There are 1 files in 'Folder B21':
      b.txt
like image 103
Phrogz Avatar answered Nov 14 '22 14:11

Phrogz