Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse XML file in chunks

Tags:

python

I have a very large XML file with 40,000 tag elements. When i am using element tree to parse this file it's giving errors due to memory. So is there any module in python that can read the xml file in data chunks without loading the entire xml into memory?And How i can implement that module?

like image 230
Kratos85 Avatar asked Feb 12 '12 13:02

Kratos85


2 Answers

Probably the best library for working with XML in Python is lxml, in this case you should be interested in iterparse/iterwalk.

like image 84
zeekay Avatar answered Sep 19 '22 12:09

zeekay


This is a problem that people usually solve using sax.

If your huge file is basically a bunch of XML documents aggregated inside and overall XML envelope, then I would suggest using sax (or plain string parsing) to break it up into a series of individual documents that you can then process using lxml.etree.

like image 23
Michael Dillon Avatar answered Sep 21 '22 12:09

Michael Dillon