Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

creating large file xml in ruby

I want to write approximately 50MB of data to an XML file.

I found Nokogiri (1.5.0) to be efficient for parsing when just reading and not writing. Nokogiri is not a good option to write to an XML file since it holds the complete XML data in memory until it finally writes it.

I found Builder (3.0.0) to be a good option but I'm not sure if it's the best option.

I tried some benchmarks with the following simple code:

  (1..500000).each do |k|
    xml.products {
      xml.widget {
        xml.id_ k
        xml.name "Awesome widget"
      }
    }
    end

Nokogiri takes about 143 seconds and also memory consumption gradually increased and ended at about 700 MB.

Builder took about 123 seconds and memory consumption was stable enough at 10 MB.

So is there a better solution to write huge XML files (50 MB) in Ruby?

Here's the code using Nokogiri:

require 'rubygems'
require 'nokogiri'
a = Time.now
builder = Nokogiri::XML::Builder.new do |xml|
  xml.root {
    (1..500000).each do |k|
    xml.products {
      xml.widget {
        xml.id_ k
        xml.name "Awesome widget"
      }
    }
    end
  }
end
o = File.new("test_noko.xml", "w")
o.write(builder.to_xml)
o.close
puts (Time.now-a).to_s

Here's the code using Builder:

require 'rubygems'
require 'builder'
a = Time.now
File.open("test.xml", 'w') {|f|
xml = Builder::XmlMarkup.new(:target => f, :indent => 1)

  (1..500000).each do |k|
    xml.products {
      xml.widget {
        xml.id_ k
        xml.name "Awesome widget"
      }
    }
    end

}
puts (Time.now-a).to_s
like image 530
Gaurav Shah Avatar asked Sep 19 '11 05:09

Gaurav Shah


1 Answers

Solution 1

If speed is your main concern, I'd just use libxml-ruby directly:

$ time ruby test.rb 

real    0m7.352s
user    0m5.867s
sys     0m0.921s

The API is pretty straight forward:

require 'rubygems'
require 'xml'
doc = XML::Document.new()
doc.root = XML::Node.new('root_node')
root = doc.root

500000.times do |k|
  root << elem1 = XML::Node.new('products')
  elem1 << elem2 = XML::Node.new('widget')
  elem2['id'] = k.to_s
  elem2['name'] = 'Awesome widget'
end

doc.save('foo.xml', :indent => false, :encoding => XML::Encoding::UTF_8)

Using :indent => true doesn't make much difference in this case, but for more complex XML files it might.

$ time ruby test.rb #(with indent)

real    0m7.395s
user    0m6.050s
sys     0m0.847s

Solution 2

Of course the fastest solution, and that doesn't build up on memory is just to write the XML manually but that will easily generate other sources of error like possibly invalid XML:

$ time ruby test.rb 

real    0m1.131s
user    0m0.873s
sys     0m0.126s

Here's the code:

f = File.open("foo.xml", "w")
f.puts('<doc>')
500000.times do |k|
  f.puts "<product><widget id=\"#{k}\" name=\"Awesome widget\" /></product>"
end
f.puts('</doc>')
f.close
like image 118
sunkencity Avatar answered Nov 18 '22 23:11

sunkencity