Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate unique equal hash for equal dictionaries?

Tags:

python

eg:

>>> a = {'req_params': {'app': '12345', 'format': 'json'}, 'url_params': {'namespace': 'foo', 'id': 'baar'}, 'url_id': 'rest'}
>>> b = {'req_params': { 'format': 'json','app': '12345' }, 'url_params': { 'id': 'baar' ,'namespace':'foo' }, 'url_id': 'REST'.lower() }
>>> a == b
 True

What is a good hash function to generate equal hashes for both dicts ? The dictionaries will have basic datatypes like int,list,dict,and strings,no other objects.

It would be great if the hash is space optimized, the target set is around 5 million objects, hence collision chances are pretty less.

I am not sure if json.dumps or other serializations pay respect to equality instead of structure of the members in the dictionary. eg. Basic hashing using str of dict does not work :

>>> a = {'name':'Jon','class':'nine'}
>>> b = {'class':'NINE'.lower(),'name':'Jon'}
>>> str(a)
"{'name': 'Jon', 'class': 'nine'}"
>>> str(b)
"{'class': 'nine', 'name': 'Jon'}"

json.dumps does not work either :

>>> import json,hashlib
>>> a = {'name':'Jon','class':'nine'}
>>> b = {'class':'NINE'.lower(),'name':'Jon'}
>>> a == b
True
>>> ha = hashlib.sha256(json.dumps(a)).hexdigest()
>>> hb = hashlib.sha256(json.dumps(b)).hexdigest()
>>> ha
'545af862cc4d2dd1926fe0aa1e34ad5c3e8a319461941b33a47a4de9dbd7b5e3'
>>> hb 
'4c7d8dbbe1f180c7367426d631410a175d47fff329d2494d80a650dde7bed5cb'
like image 347
DhruvPathak Avatar asked May 24 '13 13:05

DhruvPathak


1 Answers

The pprint module sorts the dict keys

from pprint import pformat
hash(pformat(a)) == hash(pformat(b))

If you want to persist the hashes, you should use a hash from hashlib. sha1 is plenty

like image 118
John La Rooy Avatar answered Nov 04 '22 04:11

John La Rooy