Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Persistent Hashing of Strings in Python

Tags:

How would you convert an arbitrary string into a unique integer, which would be the same across Python sessions and platforms? For example hash('my string') wouldn't work because a different value is returned for each Python session and platform.

like image 414
Cerin Avatar asked Mar 24 '10 20:03

Cerin


People also ask

How are strings hashed in Python?

A hash function is a function that takes input of a variable length sequence of bytes and converts it to a fixed length sequence. It is a one way function. This means if f is the hashing function, calculating f(x) is pretty fast and simple, but trying to obtain x again will take years.

Is Python hash consistent?

As noted by many, Python's hash is not consistent anymore (as of version 3.3), as a random PYTHONHASHSEED is now used by default (to address security concerns, as explained in this excellent answer).

Is string hashing constant time?

A hash function doesn't have to (and can't) return a unique value for every string. You could use the first 10 characters to initialize a random number generator and then use that to pull out 100 random characters from the string, and hash that. This would be constant time.

Which string hashing is best?

If you just want to have a good hash function, and cannot wait, djb2 is one of the best string hash functions i know. it has excellent distribution and speed on many different sets of keys and table sizes. you are not likely to do better with one of the "well known" functions such as PJW, K&R[1], etc.


1 Answers

Use a hash algorithm such as MD5 or SHA1, then convert the hexdigest via int():

>>> import hashlib >>> int(hashlib.md5('Hello, world!').hexdigest(), 16) 144653930895353261282233826065192032313L 
like image 168
Ignacio Vazquez-Abrams Avatar answered Sep 18 '22 14:09

Ignacio Vazquez-Abrams