Setup
I have a set of unique integers with values between 1 and 50 million. New integers are added at random e.g. numberset.add(random.randint(1, 50000000))
. I need to be able to quickly add new integers and quickly check if an integer is already present.
Problem
After a while, the set grows too large for my low memory system and I experience MemoryError
s.
Question
How can I achieve this while using less memory? What's the fastest way to do this using the disk without reconfiguring the system e.g. swapfiles? Should I use a database file like sqlite? Is there a library that will compress the integers in memory?
You can avoid dependencies on 3rd-party bit-array modules by writing your own -- the functionality required is rather minimal:
import array
BITS_PER_ITEM = array.array('I').itemsize * 8
def make_bit_array(num_bits, initially=0):
num_items = (num_bits + BITS_PER_ITEM - 1) // BITS_PER_ITEM
return array.array('I', [initially]) * num_items
def set_bit(bit_array, offset):
item_index = offset // BITS_PER_ITEM
bit_index = offset % BITS_PER_ITEM
bit_array[item_index] |= 1 << bit_index
def clear_bit(bit_array, offset):
item_index = offset // BITS_PER_ITEM
bit_index = offset % BITS_PER_ITEM
bit_array[item_index] &= ~(1 << bit_index)
def get_bit(bit_array, offset):
item_index = offset // BITS_PER_ITEM
bit_index = offset % BITS_PER_ITEM
return (bit_array[item_index] >> bit_index) & 1
Use a bit-array.This will reduce the need for huge space requirement.
Realted SO Question:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With