Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Building an XML tree from an Array of "strings/that/are/paths" (in Ruby)

What is the best way to build an XML tree in Ruby if you have an Array of string paths?


paths = [
  "nodeA1",
  "nodeA1/nodeB1/nodeC1",
  "nodeA1/nodeB1/nodeC1/nodeD1/nodeE1",
  "nodeA1/nodeB1/nodeC2",
  "nodeA1/nodeB2/nodeC2",
  "nodeA3/nodeB2/nodeC3"
]
xml = 
<nodeA1>
    <nodeB1>
        <nodeC1>
            <nodeD1>
                <nodeE1/>
            </nodeD1>
        </nodeC1>
        <nodeC2/>
    </nodeB1>
    <nodeB2>
        <nodeC2/>
        <nodeC3/>
    </nodeB2>
</nodeA1>

My first thought is to to split the path string to an array, and compare its depth and content to the previous array, but then if I get to path "nodeA1/nodeB1/nodeC1/nodeD1/nodeE1", when I go back down to "nodeA1/nodeB1/nodeC2", the [1] node is the common ancestor, but keeping track of that is messy, the way I've been doing it at least.

I would like to make it recursive also, so I could process each nest level in its own function, but haven't come to any semi-universal solution yet.

Any ideas or things you guys commonly do when you run into this problem?

Thanks! Lance

like image 738
Lance Avatar asked Dec 29 '22 14:12

Lance


2 Answers

REXML is your friend! You're getting XPaths, so use 'em!

require 'rexml/document'

paths = [
  "nodeA1",
  "nodeA1/nodeB1/nodeC1",
  "nodeA1/nodeB1/nodeC1/nodeD1/nodeE1",
  "nodeA1/nodeB1/nodeC2",
  "nodeA1/nodeB2/nodeC2",
  "nodeA3/nodeB2/nodeC3"
]

x = REXML::Document.new
x.elements << "xml"

paths.each do |p|
  steps = p.split(/\//)
  steps.each_index do |i|
    unless REXML::XPath.first(x,"/xml/" + steps[0..i]*"/")
      REXML::XPath.first(x,"/xml/" + steps[0...i]*"/").elements << steps[i]
    end
  end
end
puts x.to_s

Note that your example data has both nodeA1 and nodeA3 at the top level, so I started with a root called "xml" here. If the "3" was a typo, and nodeA1 was really your root (as your sample XML output suggests), you can delete the 'x.elements << "xml"' line and change all the "/xml/"s to just "/".

like image 194
glenn mcdonald Avatar answered May 06 '23 06:05

glenn mcdonald


This is very similar to this question. Here's a modified version based upon sris's answer:

paths = [
  "nodeA1",
  "nodeA1/nodeB1/nodeC1",
  "nodeA1/nodeB1/nodeC1/nodeD1/nodeE1",
  "nodeA1/nodeB1/nodeC2",
  "nodeA1/nodeB2/nodeC2",
  "nodeA3/nodeB2/nodeC3"
]

tree = {}

paths.each do |path|
  current  = tree
  path.split("/").inject("") do |sub_path,dir|
    sub_path = File.join(sub_path, dir)
    current[sub_path] ||= {}
    current  = current[sub_path]
    sub_path
  end
end

def make_tree(prefix, node)
  tree = ""
  node.each_pair do |path, subtree| 
    tree += "#{prefix}<#{File.basename(path)}"
    if subtree.empty?
      tree += "/>\n"
    else
      tree += ">\n"
      tree += make_tree(prefix + "\t", subtree) unless subtree.empty?
      tree += "#{prefix}</#{File.basename(path)}>\n"
    end
  end
  tree
end

xml = make_tree "", tree
print xml

Edit:

Here is a modified version that builds an actual XML document using Nokogiri. I think it's actually easier to follow than the string version. I also removed the use of File, because you don't actually need it to meet your needs:

require 'nokogiri'

paths = [
  "nodeA1",
  "nodeA1/nodeB1/nodeC1",
  "nodeA1/nodeB1/nodeC1/nodeD1/nodeE1",
  "nodeA1/nodeB1/nodeC2",
  "nodeA1/nodeB2/nodeC2",
  "nodeA3/nodeB2/nodeC3"
]

tree = {}

paths.each do |path|
  current  = tree
  path.split("/").each do |name|
    current[name] ||= {}
    current  = current[name]
  end
end

def make_tree(node, curr = nil, doc = Nokogiri::XML::Document.new)
  #You need a root node for the XML.  Feel free to rename it.
  curr ||= doc.root = Nokogiri::XML::Node.new('root', doc)
  node.each_pair do |name, subtree|
      child = curr << Nokogiri::XML::Node.new(name, doc)
      make_tree(subtree, child, doc) unless subtree.empty?
  end
  doc
end

xml = make_tree tree
print xml

Edit 2:

Yes, it is true that in Ruby 1.8 hashes aren't guaranteed to maintain insertion order. If that's an issue, there are ways to work around it. Here's a solution that retains order but doesn't bother with recursion and is much simpler for it:

require 'nokogiri'

paths = [
  "nodeA1",
  "nodeA1/nodeB1/nodeC1",
  "nodeA1/nodeB1/nodeC1/nodeD1/nodeE1",
  "nodeA1/nodeB1/nodeC2",
  "nodeA1/nodeB2/nodeC2",
  "nodeA3/nodeB2/nodeC3"
]

doc = Nokogiri::XML::Document.new
doc.root = Nokogiri::XML::Node.new('root', doc)

paths.each do |path|
  curr = doc.root
  path.split("/").each do |name|
    curr = curr.xpath(name).first || curr << Nokogiri::XML::Node.new(name, doc)
  end
end

print doc
like image 33
Pesto Avatar answered May 06 '23 05:05

Pesto