In Python 2.7, when I load all data from a text file of 2.5GB into memory for quicker processing like this:
>>> f = open('dump.xml','r')
>>> dump = f.read()
I got the following error:
Python(62813) malloc: *** mmap(size=140521659486208) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError
Why did Python try to allocate 140521659486208
bytes memory for 2563749237
bytes data? How do I fix the code to make it loads all the bytes?
I'm having around 3GB RAM free. The file is a Wiktionary xml dump.
If you use mmap, you'll be able to load the entire file into memory immediately.
import mmap
with open('dump.xml', 'rb') as f:
# Size 0 will read the ENTIRE file into memory!
m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ) #File is open read-only
# Proceed with your code here -- note the file is already in memory
# so "readine" here will be as fast as could be
data = m.readline()
while data:
# Do stuff
data = m.readline()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With