I was messing around with making a command line parser and was wondering what kind of hash algorithm python dict's use?
The way I have it set up, I have a pattern match algorithm which matches tokenized input sequences with a dictionary key. Some of the keys are relatively long (length 5 or 6 tuples of 6-7 character strings). I was wondering if there was a point at which long dictionary keys significantly reduce the efficiency of key retrieval.
The answer is: multiple hash functions can be used depending on compilation arguments and string size.
Hash tables or has maps in Python are implemented through the built-in dictionary data type. The keys of a dictionary in Python are generated by a hashing function. The elements of a dictionary are not ordered and they can be changed.
Dictionaries in Python are implemented using hash tables. It is an array whose indexes are obtained using a hash function on the keys.
All of Python's immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is derived from their id().
The hash that it uses depends on the object being used as a key -- each class can define its own __hash__ () method, and the value that it returns for a particular instance is what is used for the dictionary. Python itself provides the hash implementation for str and tuple types.
There are lots of algorithms Python could use to generate these hashes, like CityHash, MurmurHash, DJBX33A, and many more. So which hashing algorithm does Python use, and more importantly, why does Python use that one?
Hash maps are indexed data structures. A hash map makes use of a hash function to compute an index with a key into an array of buckets or slots. Its value is mapped to the bucket with the corresponding index. The key is unique and immutable. Think of a hash map as a cabinet having drawers with labels for the things stored in them.
Hash Map in Python. Hash maps are indexed data structures. A hash map makes use of a hash function to compute an index with a key into an array of buckets or slots. Its value is mapped to the bucket with the corresponding index. The key is unique and immutable.
The hash that it uses depends on the object being used as a key -- each class can define its own __hash__() method, and the value that it returns for a particular instance is what is used for the dictionary.
Python itself provides the hash implementation for str and tuple types. A quick look at the source should reveal the exact algorithm for those.
The hash of a tuple is based on the hashes of its contents. The algorithm is essentially this (simplified slightly):
def hash(tuple): mult = 1000003 x = 0x345678 for index, item in enumerate(tuple): x = ((x ^ hash(item)) * mult) & (1<<32) mult += (82520 + (len(tuple)-index)*2) return x + 97531
For strings, the interpreter also iterates over every character, combining them with this (again, slightly simplified) algorithm:
def hash(string): x = string[0] << 7 for chr in string[1:]: x = ((1000003 * x) ^ chr) & (1<<32) return x
A bigger issue to worry about is avoiding hash collisions. Colliding hash keys will cause a linear search as the dictionary tries to find a place to store the new object (this is now being recognized as a security issue, and tha behavior may be changing in upcoming python versions)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With