Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use Nokogiri to write a HUGE XML file?

I have a Rails application that uses delayed_job in a reporting feature to run some very large reports. One of these generates a massive XML file and it can take literally days in the bad, old way the code is written. I thought that, having seen impressive benchmarks on the internet, Nokogiri could afford us some nontrivial performance gains.

However, the only examples I can find involve using the Nokogiri Builder to create an xml object, then using .to_xml to write the whole thing. But there isn't enough memory in my zip code to handle that for a file of this size.

So can I use Nokogiri to stream or write this data out to file?

like image 459
AKWF Avatar asked Feb 09 '11 01:02

AKWF


1 Answers

Nokogiri is designed to build in memory because you build a DOM and it converts it to XML on the fly. It's easy to use, but there are trade-offs, and doing it in memory is one of them.

You might want to look into using Erubis to generate the XML. Rather than gather all the data before processing and keeping the logic in a controller, like we'd do with Rails, to save memory you can put your logic in the template and have it iterate over your data, which should help with the resource demands.

If you need the XML in a file you might need to do that using redirection:

erubis options templatefile.erb > xmlfile

This is a very simple example, but it shows you could easily define a template to generate XML:

<% 
asdf = (1..5).to_a 
%>
<xml>
  <element>
<% asdf.each do |i| %>
    <subelement><%= i %></subelement>
<% end %>
  </element>
</xml>

which, when I call erubis test.erb outputs:

<xml>
  <element>
    <subelement>1</subelement>
    <subelement>2</subelement>
    <subelement>3</subelement>
    <subelement>4</subelement>
    <subelement>5</subelement>
  </element>
</xml>

EDIT:

The string concatenation was taking forever...

Yes, it can simply because of garbage collection. You don't show any code example of how you're building your strings, but Ruby works better when you use << to append one string to another than when using +.

It also might work better to not try to keep everything in a string, but instead to write it immediately to disk, appending to an open file as you go.

Again, without code examples I'm shooting in the dark about what you might be doing or why things run slow.

like image 63
the Tin Man Avatar answered Nov 05 '22 12:11

the Tin Man