Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Massive Python set of integers using too much memory

Setup

  • Python 2.6
  • Ubuntu x64

I have a set of unique integers with values between 1 and 50 million. New integers are added at random e.g. numberset.add(random.randint(1, 50000000)). I need to be able to quickly add new integers and quickly check if an integer is already present.

Problem

After a while, the set grows too large for my low memory system and I experience MemoryErrors.

Question

How can I achieve this while using less memory? What's the fastest way to do this using the disk without reconfiguring the system e.g. swapfiles? Should I use a database file like sqlite? Is there a library that will compress the integers in memory?

like image 223
2371 Avatar asked Dec 12 '22 18:12

2371


2 Answers

You can avoid dependencies on 3rd-party bit-array modules by writing your own -- the functionality required is rather minimal:

import array

BITS_PER_ITEM = array.array('I').itemsize * 8

def make_bit_array(num_bits, initially=0):
    num_items = (num_bits + BITS_PER_ITEM - 1) // BITS_PER_ITEM
    return array.array('I', [initially]) * num_items

def set_bit(bit_array, offset):
    item_index = offset // BITS_PER_ITEM
    bit_index = offset % BITS_PER_ITEM
    bit_array[item_index] |= 1 << bit_index

def clear_bit(bit_array, offset):
    item_index = offset // BITS_PER_ITEM
    bit_index = offset % BITS_PER_ITEM
    bit_array[item_index] &= ~(1 << bit_index)

def get_bit(bit_array, offset):
    item_index = offset // BITS_PER_ITEM
    bit_index = offset % BITS_PER_ITEM
    return (bit_array[item_index] >> bit_index) & 1
like image 142
John Machin Avatar answered Jan 22 '23 07:01

John Machin


Use a bit-array.This will reduce the need for huge space requirement.

Realted SO Question:

  • Python equivalent to Java's BitSet
like image 25
Emil Avatar answered Jan 22 '23 07:01

Emil