What is a quick way of writing an XML file iteratively (i.e. without having the whole document in memory)? xml.sax.saxutils.XMLGenerator
works but is slow, around 1MB/s on an I7 machine. Here is a test case.
To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree. parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot().
There are two ways to parse the file using 'ElementTree' module. The first is by using the parse() function and the second is fromstring() function. The parse () function parses XML document which is supplied as a file whereas, fromstring parses XML when supplied as a string i.e within triple quotes.
Use lxml objectify, it will parse the xml into Python objects.
I realize that this question has been asked awhile ago, but, in the mean time, an lxml
API has been introduced that looks promising in terms of addressing the problem: http://lxml.de/api.html ; specifically, refer to the following section: "Incremental XML generation".
I quickly tested it by streaming a 10M file just as in your benchmark, and it took a fraction of a second on my old laptop, which is by no means very scientific, but is quite in the same ballpark as your generate_large_xml()
function.
As Yury V. Zaytsev mentioned, lxml
realy provides API for generating XML documents in streaming manner
Here is working example:
from lxml import etree
fname = "streamed.xml"
with open(fname, "w") as f, etree.xmlfile(f) as xf:
attribs = {"tag": "bagggg", "text": "att text", "published": "now"}
with xf.element("root", attribs):
xf.write("root text\n")
for i in xrange(10):
rec = etree.Element("record", id=str(i))
rec.text = "record text data"
xf.write(rec)
Resulting XML looks like this (the content reformatted from one-line XML doc):
<?xml version="1.0"?>
<root text="att text" tag="bagggg" published="now">root text
<record id="0">record text data</record>
<record id="1">record text data</record>
<record id="2">record text data</record>
<record id="3">record text data</record>
<record id="4">record text data</record>
<record id="5">record text data</record>
<record id="6">record text data</record>
<record id="7">record text data</record>
<record id="8">record text data</record>
<record id="9">record text data</record>
</root>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With