I need to create an in memory object that has keys that are a 9 digit integer and a boolean value associated with each key. I've been using a dict as in the simplified example below:
#!/usr/bin/python
from __future__ import print_function
import sys
myDict = {}
for n in range(56000):
myDict[n] = True
print('Count:',len(myDict),' Size:', sys.getsizeof(myDict))
I need to be able to look up and retrieve the boolean value associated with each key. The problem is the size of the dict. Using Python 2.7 on a 64 bit Linux system and the above example, the size of the dict is 3.1 megabytes according to sys.getsizeof(). (about 56 bytes per entry to store 9 digits plus a boolean value)
I need to store the boolean state of (approx) 55.000 entries in the dict. Each dict key is a 9 digit integer. I've tried using the integer and str(theInteger) as keys with no change in the size of the dict.
Is there some other sort of data structure or methodology I should be using to conserve memory with such a large data set?
This sums up to at least 12 bytes on a 32bit machine and 24 bytes on a 64bit machine. The dictionary starts with 8 empty buckets. This is then resized by doubling the number of entries whenever its capacity is reached.
The Python dictionary implementation consumes a surprisingly small amount of memory. But the space taken by the many int and (in particular) string objects, for reference counts, pre-calculated hash codes etc., is more than you'd think at first.
Python does not free memory back to the system immediately after it destroys some object instance. It has some object pools, called arenas, and it takes a while until those are released. In some cases, you may be suffering from memory fragmentation which also causes process' memory usage to grow. sys.
In disk you have just strings, when loading to Python the interpreter has to create a whole structure for each string and for each dictionary, besides the string itself. There's no way to reduce the memory used by the dicts, but there are other ways to approach the problem.
There's no way to reduce the memory used by the dicts, but there are other ways to approach the problem. If you're willing to trade some speed for memory, you should consider storing and querying the strings from an SQLite file instead of loading everything to dictionaries in memory.
So you can reduce memory usage even further, to about 8MB, by using a Pandas DataFrame to store the information: it will use NumPy arrays to efficiently store the numbers internally. In general, storing too many Python objects at once will waste memory. As always, solutions can involve compression, batching, or indexing:
Not bad; given how often dictionaries are used in Python, it’s good to know that they don’t normally consume that much memory. What if I add something to the dict? What will happen to the memory usage? Something seems a bit fishy here, right?
If you look up your boolean with an integer key, and the range of keys starts with 0 and is continuous, there's really no reason not to use a list:
my_list = []
for n in range(56000):
my_list[n] = True
or better:
my_list = [True for n in range(5600])
If that is not enough, try the array
module and use one byte per bool:
import array
my_array = array.array("b", (True for n in range(56000)))
And if that is not good enough, try the bitarray module on PyPi.
Another idea is to use a set
: Say you have many more False
than True
, simply have a set:
my_true_numbers = {0, 2323, 23452} # just the True ones
and check with
value = number in my_true_numbers
If you have more True
than False
, do it the other way round.
The accepted answer of Python: Reducing memory usage of dictionary has the conclusion that there it not much you can do and i agree with it. The overall memory overhead of a dictionary is small but the number of key-value pairs of your example raises the memory footprint.
There might one think possible: If the keys are always linear you could create a list of booleans directly or better use bitarray. The keys would then be implicit. But if this is only in your example the case you can't do much.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With