Saving memory of dict by using bitarray instead of int?

Question

I am trying to reduce the memory-consumption of a python dict, which in my case serves as a word-->document_id "inverted index". Each word is hashed as an integer, which takes up 24 bytes.

I was wondering if I can convert each element within dict's values and each key within dict to a bitarray instead. I've noticed that the max value of any encountered int is less than 2^22, so I can maybe just allocate a bit-array of "size 22".

How can this be done? So far I've seen gmpy2 and bitarray libraries, as well as std::bitset in the C++ stdlib, which I can use with Cython. I've read from this post that bitarray is not as fast as gmpy. In gmpy, I am not sure how to set the size. Finally, I wonder if the memory-overhead of gmpy or bitarray objects in Python is worth it, when I can just use std::bitset, which probably uses the least memory of all.

user2357112 supports Monica · Accepted Answer

>>> sys.getsizeof(1)
24

That's 24 bytes, just for an integer, on my machine. For Python objects, object overhead is going to swamp all other costs for such small values. Even if you could shave off 10 bits (which you can't; memory allocation doesn't work that way), it's peanuts compared to the refcount, the type pointer, and the dict entry itself (which isn't included in that 24 number), even before you get to the actual data.

bitarray isn't going to help; it's probably bigger than an int. I don't know about std::bitset, since I'm not sure what overhead Cython adds to it, but it's going to have overhead for things like the bit count. If anything, I'd expect a Cython int to work best, but it probably needs to be converted to a regular Python int to go in a dict.

Saving memory of dict by using bitarray instead of int?

Tags:

c++

python

cython

bitset

bitarray

richizy

1 Answers

user2357112 supports Monica

Recent Activity

Donate For Us

Saving memory of dict by using bitarray instead of int?

Tags:

c++

python

cython

bitset

bitarray

richizy

1 Answers

user2357112 supports Monica

Related questions

Recent Activity

Donate For Us