Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to stream XML output quickly from Python

Tags:

python

xml

sax

What is a quick way of writing an XML file iteratively (i.e. without having the whole document in memory)? xml.sax.saxutils.XMLGenerator works but is slow, around 1MB/s on an I7 machine. Here is a test case.

like image 394
Alex Morega Avatar asked Oct 21 '13 18:10

Alex Morega


People also ask

How do I get XML data in Python?

To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree. parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot().

How do you traverse XML in Python?

There are two ways to parse the file using 'ElementTree' module. The first is by using the parse() function and the second is fromstring() function. The parse () function parses XML document which is supplied as a file whereas, fromstring parses XML when supplied as a string i.e within triple quotes.

How parse XML URL in Python?

Use lxml objectify, it will parse the xml into Python objects.


2 Answers

I realize that this question has been asked awhile ago, but, in the mean time, an lxml API has been introduced that looks promising in terms of addressing the problem: http://lxml.de/api.html ; specifically, refer to the following section: "Incremental XML generation".

I quickly tested it by streaming a 10M file just as in your benchmark, and it took a fraction of a second on my old laptop, which is by no means very scientific, but is quite in the same ballpark as your generate_large_xml() function.

like image 193
Yury V. Zaytsev Avatar answered Sep 27 '22 01:09

Yury V. Zaytsev


As Yury V. Zaytsev mentioned, lxml realy provides API for generating XML documents in streaming manner

Here is working example:

from lxml import etree

fname = "streamed.xml"
with open(fname, "w") as f, etree.xmlfile(f) as xf:
    attribs = {"tag": "bagggg", "text": "att text", "published": "now"}
    with xf.element("root", attribs):
        xf.write("root text\n")
        for i in xrange(10):
            rec = etree.Element("record", id=str(i))
            rec.text = "record text data"
            xf.write(rec)

Resulting XML looks like this (the content reformatted from one-line XML doc):

<?xml version="1.0"?>
<root text="att text" tag="bagggg" published="now">root text
    <record id="0">record text data</record>
    <record id="1">record text data</record>
    <record id="2">record text data</record>
    <record id="3">record text data</record>
    <record id="4">record text data</record>
    <record id="5">record text data</record>
    <record id="6">record text data</record>
    <record id="7">record text data</record>
    <record id="8">record text data</record>
    <record id="9">record text data</record>
</root>
like image 32
Jan Vlcinsky Avatar answered Sep 26 '22 01:09

Jan Vlcinsky