I have a very large XML file with 40,000 tag elements. When i am using element tree to parse this file it's giving errors due to memory. So is there any module in python that can read the xml file in data chunks without loading the entire xml into memory?And How i can implement that module?
Probably the best library for working with XML in Python is lxml
, in this case you should be interested in iterparse
/iterwalk
.
This is a problem that people usually solve using sax.
If your huge file is basically a bunch of XML documents aggregated inside and overall XML envelope, then I would suggest using sax (or plain string parsing) to break it up into a series of individual documents that you can then process using lxml.etree.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With