Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a hashtable/dictionary implementation for Python that doesn't store the keys?

I'm storing millions, possibly billions of 4 byte values in a hashtable and I don't want to store any of the keys. I expect that only the hashes of the keys and the values will have to be stored. This has to be fast and all kept in RAM. The entries would still be looked up with the key, unlike set()'s.

What is an implementation of this for Python? Is there a name for this?

Yes, collisions are allowed and can be ignored.

(I can make an exception for collisions, the key can be stored for those. Alternatively, collisions can just overwrite the previously stored value.)

like image 703
user213060 Avatar asked Dec 16 '09 23:12

user213060


People also ask

Are Python dictionaries implemented as hash tables?

Python dictionaries are implemented as hash tables. Hash tables must allow for hash collisions i.e. even if two distinct keys have the same hash value, the table's implementation must have a strategy to insert and retrieve the key and value pairs unambiguously.

How is Hashtable implemented in Python?

Python's built-in “hash” function is used to create a hash value of any key. This function is useful as it creates an integer hash value for both string and integer key. The hash value for integer will be same as it is, i.e. hash(10) will be 10, hash(20) will be 20, and so on.

Is Hashtable same as dictionary Python?

Hash tables or has maps in Python are implemented through the built-in dictionary data type. The keys of a dictionary in Python are generated by a hashing function. The elements of a dictionary are not ordered and they can be changed.


1 Answers

Bloomier filters - space-efficient associative array

From the Wikipedia:

Chazelle et al. (2004) designed a generalization of Bloom filters that could associate a value with each element that had been inserted, implementing an associative array. Like Bloom filters, these structures achieve a small space overhead by accepting a small probability of false positives. In the case of "Bloomier filters", a false positive is defined as returning a result when the key is not in the map. The map will never return the wrong value for a key that is in the map.

like image 62
jfs Avatar answered Nov 14 '22 21:11

jfs