Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't Python hash function give the same values when run on Android implementation?

Tags:

python

hash

sl4a

I believed that hash() function works the same in all python interpreters. But it differs when I run it on my mobile using python for android. I get same hash value for hashing strings and numbers but when I hash built-in data types the hash value differs.

PC Python Interpreter (Python 2.7.3)

>>> hash(int) 31585118 >>> hash("hello sl4a") 1532079858 >>> hash(101) 101 

Mobile Python Interpreter (Python 2.6.2)

>>> hash(int) -2146549248 >>> hash("hello sl4a") 1532079858 >>> hash(101) 101 

Can any one tell me is it a bug or I misunderstood something.

like image 303
Balakrishnan Avatar asked Jun 19 '13 13:06

Balakrishnan


People also ask

Does Python hash always return the same value?

Yes, if you hash the same input with the same function, you will always get the same result.

Is Python hash function consistent?

As noted by many, Python's hash is not consistent anymore (as of version 3.3), as a random PYTHONHASHSEED is now used by default (to address security concerns, as explained in this excellent answer).

Are Python hashes unique?

Python uses hash tables for dictionaries and sets. A hash table is an unordered collection of key-value pairs, where each key is unique.

How does Python generate hash value?

The hash() method returns the hash value of an object if it has one. Hash values are just integers that are used to compare dictionary keys during a dictionary look quickly.


2 Answers

hash() is randomised by default each time you start a new instance of recent versions (Python3.3+) to prevent dictionary insertion DOS attacks

Prior to that, hash() was different for 32bit and 64bit builds anyway.

If you want something that does hash to the same thing every time, use one of the hashes in hashlib

>>> import hashlib >>> hashlib.algorithms ('md5', 'sha1', 'sha224', 'sha256', 'sha384', 'sha512') 
like image 115
John La Rooy Avatar answered Sep 24 '22 01:09

John La Rooy


for old python (at least, my Python 2.7), it seems that

hash(<some type>) = id(<type>) / 16 

and for CPython id() is the address in memory - http://docs.python.org/2/library/functions.html#id

>>> id(int) / hash(int)                                                      16                                                                               >>> id(int) % hash(int)                                                  0                                                                                

so my guess is that the Android port has some strange convention for memory addresses?

anyway, given the above, hashes for types (and other built-ins i guess) will differ across installs because functions are at different addresses.

in contrast, hashes for values (what i think you mean by "non-internal objects") (before the random stuff was added) are calculated from their values and so likely repeatable.

PS but there's at least one more CPython wrinkle:

>>> for i in range(-1000,1000): ...     if hash(i) != i: print(i) ... -1 

there's an answer here somewhere explaining that one...

like image 44
andrew cooke Avatar answered Sep 22 '22 01:09

andrew cooke