Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What hash algorithm does Python's dictionary mapping use?

Tags:

python

hashmap

I was messing around with making a command line parser and was wondering what kind of hash algorithm python dict's use?

The way I have it set up, I have a pattern match algorithm which matches tokenized input sequences with a dictionary key. Some of the keys are relatively long (length 5 or 6 tuples of 6-7 character strings). I was wondering if there was a point at which long dictionary keys significantly reduce the efficiency of key retrieval.

like image 877
Joel Cornett Avatar asked Jan 25 '12 04:01

Joel Cornett


People also ask

What algorithm is used in dictionary Hashmap hash table?

The answer is: multiple hash functions can be used depending on compilation arguments and string size.

How does Python dictionary hash?

Hash tables or has maps in Python are implemented through the built-in dictionary data type. The keys of a dictionary in Python are generated by a hashing function. The elements of a dictionary are not ordered and they can be changed.

Do Python dictionaries use hash tables?

Dictionaries in Python are implemented using hash tables. It is an array whose indexes are obtained using a hash function on the keys.

What hash function does Python use for sets?

All of Python's immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is derived from their id().

What is the hash of a Python dictionary?

The hash that it uses depends on the object being used as a key -- each class can define its own __hash__ () method, and the value that it returns for a particular instance is what is used for the dictionary. Python itself provides the hash implementation for str and tuple types.

Which hashing algorithm does Python use?

There are lots of algorithms Python could use to generate these hashes, like CityHash, MurmurHash, DJBX33A, and many more. So which hashing algorithm does Python use, and more importantly, why does Python use that one?

What is a hash map?

Hash maps are indexed data structures. A hash map makes use of a hash function to compute an index with a key into an array of buckets or slots. Its value is mapped to the bucket with the corresponding index. The key is unique and immutable. Think of a hash map as a cabinet having drawers with labels for the things stored in them.

How do you use HashMap in Python?

Hash Map in Python. Hash maps are indexed data structures. A hash map makes use of a hash function to compute an index with a key into an array of buckets or slots. Its value is mapped to the bucket with the corresponding index. The key is unique and immutable.


1 Answers

The hash that it uses depends on the object being used as a key -- each class can define its own __hash__() method, and the value that it returns for a particular instance is what is used for the dictionary.

Python itself provides the hash implementation for str and tuple types. A quick look at the source should reveal the exact algorithm for those.

The hash of a tuple is based on the hashes of its contents. The algorithm is essentially this (simplified slightly):

def hash(tuple):     mult = 1000003     x = 0x345678     for index, item in enumerate(tuple):         x = ((x ^ hash(item)) * mult) & (1<<32)         mult += (82520 + (len(tuple)-index)*2)     return x + 97531 

For strings, the interpreter also iterates over every character, combining them with this (again, slightly simplified) algorithm:

def hash(string):     x = string[0] << 7     for chr in string[1:]:         x = ((1000003 * x) ^ chr) & (1<<32)     return x 

A bigger issue to worry about is avoiding hash collisions. Colliding hash keys will cause a linear search as the dictionary tries to find a place to store the new object (this is now being recognized as a security issue, and tha behavior may be changing in upcoming python versions)

like image 107
Ian Clelland Avatar answered Sep 23 '22 12:09

Ian Clelland