Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating very large XML files in Python?

Tags:

python

xml

lxml

Does anyone know of a memory efficient way to generate very large xml files (e.g. 100-500 MiB) in Python?

I've been utilizing lxml, but memory usage is through the roof.

like image 287
Kenneth Reitz Avatar asked Jun 15 '10 21:06

Kenneth Reitz


People also ask

How do you create an XML file in Python?

Creating XML Document using Python First, we import minidom for using xml. dom . Then we create the root element and append it to the XML. After that creating a child product of parent namely Geeks for Geeks.

How much data can XML handle?

There is no limit of XML file size but it takes memory (RAM) as file size of XML file, so long XML file parsing size is performance hit.


2 Answers

Perhaps you could use a templating engine instead of generating/building the xml yourself?

Genshi for example is xml-based and supports streaming output. A very basic example:

from genshi.template import MarkupTemplate

tpl_xml = '''
<doc xmlns:py="http://genshi.edgewall.org/">
<p py:for="i in data">${i}</p>
</doc>
'''

tpl = MarkupTemplate(tpl_xml)
stream = tpl.generate(data=xrange(10000000))

with open('output.xml', 'w') as f:
    stream.render(out=f)

It might take a while, but memory usage remains low.

The same example for the Mako templating engine (not "natively" xml), but a lot faster:

from mako.template import Template
from mako.runtime import Context

tpl_xml = '''
<doc>
% for i in data:
<p>${i}</p>
% endfor
</doc>
'''

tpl = Template(tpl_xml)

with open('output.xml', 'w') as f:
    ctx = Context(f, data=xrange(10000000))
    tpl.render_context(ctx)

The last example ran on my laptop for about 20 seconds, producing a (admittedly very simple) 151 MB xml file, no memory problems at all. (according to Windows task manager it remained constant at about 10MB)

Depending on your needs, this might be a friendlier and faster way of generating xml than using SAX etc... Check out the docs to see what you can do with these engines (there are others as well, I just picked out these two as examples)

like image 73
Steven Avatar answered Sep 23 '22 05:09

Steven


The only sane way to generate so large an XML file is line by line, which means printing while running a state machine, and lots of testing.

like image 37
Ignacio Vazquez-Abrams Avatar answered Sep 22 '22 05:09

Ignacio Vazquez-Abrams