I'm parsing an xml file (291 MB) in python 3.5 with
import xmltodict, json
with open('Wikipedia-20160404094133.xml', encoding='utf-8') as xml_file:
dic_xml = xmltodict.parse(xml_file.read(), encoding='utf-8', xml_attribs=True)
but I get the error:
dic_xml = xmltodict.parse(xml_file.read(), encoding='utf-8', xml_attribs=True)
MemoryError
What can I do to solve this?
Check out this.
"xmltodict is very fast (Expat-based) and has a streaming mode with a small memory footprint, suitable for big XML dumps like Discogs or Wikipedia"
Essentially, you need to read the file in chunks and xmltodict's "streaming mode" seems to be built for this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With