I'm trying to iterate over a folder structure in XML with Nokogiri but I'm stuck in this:
<test>
<folder name="Folder A">
<folder name="Folder A1">
<file name="a.txt">Cool file</file>
</folder>
<folder name="Folder A2"></folder>
</folder>
<folder name="Folder B">
<folder name="Folder B1"></folder>
<folder name="Folder B2">
<folder name="Folder B21">
<file name="b.txt"></file>
</folder>
</folder>
</test>
So, I want to iterate over this in order to be able to create a tree of folders and files (folders A1 and A2 are inside folder A, folders B1 and B2 are inside folder B, and folder B21 is inside folder B2).
So I'm doing this:
nodes = allnodes.xpath('//folder')
nodes.each do |node|
puts "name => #{node.attributes['name']}"
end
but this lists me all the folders (A, A1, A2, B, B1, B2, B21). How can I make it so that I don't check inside the previous folders for more folders, and I then send it to the same recursive function?
Thanks a lot for the help :)
I'd do as below :
require 'nokogiri'
doc = Nokogiri::XML(<<-xml)
<test>
<folder name="Folder A">
<folder name="Folder A1">
<file name="a.txt">Cool file</file>
</folder>
<folder name="Folder A2"></folder>
</folder>
<folder name="Folder B">
<folder name="Folder B1"></folder>
<folder name="Folder B2">
<folder name="Folder B21">
<file name="b.txt"></file>
</folder>
</folder>
</test>
xml
# Here I am collecting all folders, which has at-least one child.
parent_folders = doc.xpath("//folder").select do|folder_node|
folder_node.xpath("./folder").size > 0
end
# Here I will iterate each parent directory, and would collect the corresponding
# sub-directories names.
parent_directory = parent_folders.each_with_object({}) do |parent_dir,dir_hash|
dir_hash[parent_dir['name']] = parent_dir.xpath("./folder").collect do |sub_dir|
sub_dir['name']
end
end
parent_directory
# => {"Folder A"=>["Folder A1", "Folder A2"],
# "Folder B"=>["Folder B1", "Folder B2", "Folder B21"],
# "Folder B2"=>["Folder B21"]}
Now, you have a hash parent_directory
, which maintains all the directory(key)/sub-directories(value) relationship. Now using Hash#[]
method, you can easily extract the sub-directories, of a given directory. One example -
parent_directory['Folder A'] # => ["Folder A1", "Folder A2"]
When you use an XPath with //foo
you find foo
elements at any level. If you instead use ./foo
or just foo
then you will only find child elements. Thus:
# Given an XML node, yields the node and all <file> children
# Then recursively does the same with every <folder> child
def process_files_and_folders(node,&blk)
yield node, node.xpath('file')
node.xpath('folder').each{ |folder| process_files_and_folders(folder,&blk) }
end
The keys to this are (a) recursion (having the method call itself for all the child folders) and (b) capturing the block passed by the user with the &blk
notation, and then passing that block along to the later calls.
Seen in action:
require 'nokogiri'
doc = Nokogiri.XML(my_xml)
process_files_and_folders( doc.root ) do |folder,files|
depth = folder.ancestors.length-1 # Just for pretty text output indenting
indent = " "*depth # Just for pretty text output indenting
if folder['name']
puts "#{indent}Processing the folder named #{folder['name']}"
else
puts "#{indent}No folder name; probably the root element."
end
unless files.empty?
puts "#{indent}There are #{files.length} files in '#{folder['name']}':"
files.each{ |file| print indent, file['name'], "\n" }
end
end
Result:
No folder name; probably the root element.
Processing the folder named Folder A
Processing the folder named Folder A1
There are 1 files in 'Folder A1':
a.txt
Processing the folder named Folder A2
Processing the folder named Folder B
Processing the folder named Folder B1
Processing the folder named Folder B2
Processing the folder named Folder B21
There are 1 files in 'Folder B21':
b.txt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With