Memory error in Python while parsing a 300 MB file

Question

I'm parsing an xml file (291 MB) in python 3.5 with

import xmltodict, json

with open('Wikipedia-20160404094133.xml', encoding='utf-8') as xml_file:
    dic_xml = xmltodict.parse(xml_file.read(), encoding='utf-8', xml_attribs=True)

but I get the error:

dic_xml = xmltodict.parse(xml_file.read(), encoding='utf-8', xml_attribs=True)
MemoryError

What can I do to solve this?

jDo · Accepted Answer

Check out this.

"xmltodict is very fast (Expat-based) and has a streaming mode with a small memory footprint, suitable for big XML dumps like Discogs or Wikipedia"

Essentially, you need to read the file in chunks and xmltodict's "streaming mode" seems to be built for this.

Memory error in Python while parsing a 300 MB file

Tags:

python

memory

parsing

Knokkelgeddon

1 Answers

jDo

Recent Activity

Donate For Us

Memory error in Python while parsing a 300 MB file

Tags:

python

memory

parsing

Knokkelgeddon

1 Answers

jDo

Related questions

Recent Activity

Donate For Us