Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python dictionary memory usage

I've been working on a project that involves loading a relatively large dictionary into memory from a file. The dictionary has just under 2 million entries, each entry (key and value combined) is under 20 bytes. The size of the file on disk is 38 MB.

My problem is that when I try to load the dictionary, my program immediately expands to over 2.5 gigabytes of memory used.

Here is the code I use to read the dictionary in from disk:

f = open('someFile.txt', 'r')
rT = eval(f.read())
f.close()
like image 273
dckrooney Avatar asked May 07 '11 21:05

dckrooney


1 Answers

I think the memory is used to parse the dictionary syntax AST.

For this kind of use it's much better if you go for the cPickle module instead of using repr/eval.

import cPickle

x = {}
for i in xrange(1000000):
    x["k%i" % i] = "v%i" % i
cPickle.dump(x, open("data", "wb"), -1)

x = cPickle.load(open("data", "rb"))

-1 when dumping means using latest protocol that is more efficient but possibly not backward compatible with older python versions. If this is a good idea or not depends on why you need to dump/load.

like image 184
6502 Avatar answered Sep 29 '22 21:09

6502