I've been working on a project that involves loading a relatively large dictionary into memory from a file. The dictionary has just under 2 million entries, each entry (key and value combined) is under 20 bytes. The size of the file on disk is 38 MB.
My problem is that when I try to load the dictionary, my program immediately expands to over 2.5 gigabytes of memory used.
Here is the code I use to read the dictionary in from disk:
f = open('someFile.txt', 'r')
rT = eval(f.read())
f.close()
I think the memory is used to parse the dictionary syntax AST.
For this kind of use it's much better if you go for the cPickle module instead of using repr
/eval
.
import cPickle
x = {}
for i in xrange(1000000):
x["k%i" % i] = "v%i" % i
cPickle.dump(x, open("data", "wb"), -1)
x = cPickle.load(open("data", "rb"))
-1
when dumping means using latest protocol that is more efficient but possibly not backward compatible with older python versions. If this is a good idea or not depends on why you need to dump/load.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With