Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Computing an md5 hash of a data structure

I want to compute an md5 hash not of a string, but of an entire data structure. I understand the mechanics of a way to do this (dispatch on the type of the value, canonicalize dictionary key order and other randomness, recurse into sub-values, etc). But it seems like the kind of operation that would be generally useful, so I'm surprised I need to roll this myself.

Is there some simpler way in Python to achieve this?

UPDATE: pickle has been suggested, and it's a good idea, but pickling doesn't canonicalize dictionary key order:

>>> import cPickle as pickle >>> import hashlib, random  >>> for i in range(10): ...  k = [i*i for i in range(1000)] ...  random.shuffle(k) ...  d = dict.fromkeys(k, 1) ...  p = pickle.dumps(d) ...  print hashlib.md5(p).hexdigest() ... 51b5855799f6d574c722ef9e50c2622b 43d6b52b885f4ecb4b4be7ecdcfbb04e e7be0e6d923fe1b30c6fbd5dcd3c20b9 aebb2298be19908e523e86a3f3712207 7db3fe10dcdb70652f845b02b6557061 43945441efe82483ba65fda471d79254 8e4196468769333d170b6bb179b4aee0 951446fa44dba9a1a26e7df9083dcadf 06b09465917d3881707a4909f67451ae 386e3f08a3c1156edd1bd0f3862df481 
like image 831
Ned Batchelder Avatar asked Mar 24 '11 10:03

Ned Batchelder


People also ask

How is MD5 hash calculated?

Each MD5 hash looks like 32 numbers and letters, but each digit is in hexadecimal and represents four bits. Since a single character represents eight bits (to form a byte), the total bit count of an MD5 hash is 128 bits. Two hexadecimal characters form a byte, so 32 hexadecimal characters equal 16 bytes.

How do you generate the MD5 hash of a string?

Call MessageDigest. getInstance("MD5") to get a MD5 instance of MessageDigest you can use. The compute the hash by doing one of: Feed the entire input as a byte[] and calculate the hash in one operation with md.

What is MD5 algorithm with example?

What is the MD5 Algorithm? MD5 (Message Digest Method 5) is a cryptographic hash algorithm used to generate a 128-bit digest from a string of any length. It represents the digests as 32 digit hexadecimal numbers. Ronald Rivest designed this algorithm in 1991 to provide the means for digital signature verification.

What type of algorithm best describes MD5?

MD5 is a type of algorithm that is known as a cryptographic hash algorithm. MD5 produces a hash value in a hexadecimal format. This competes with other designs where hash functions take in a certain piece of data, and change it to provide a key or value that can be used in place of the original value.


2 Answers

json.dumps() can sort dictionaries by key. So you don't need other dependencies:

import hashlib import json  data = ['only', 'lists', [1,2,3], 'dictionaries', {'a':0,'b':1}, 'numbers', 47, 'strings'] data_md5 = hashlib.md5(json.dumps(data, sort_keys=True).encode('utf-8')).hexdigest()  print(data_md5) 

Prints:

87e83d90fc0d03f2c05631e2cd68ea02 
like image 144
webwurst Avatar answered Sep 30 '22 08:09

webwurst


bencode sorts dictionaries so:

import hashlib import bencode data = ['only', 'lists', [1,2,3],  'dictionaries', {'a':0,'b':1}, 'numbers', 47, 'strings'] data_md5 = hashlib.md5(bencode.bencode(data)).hexdigest() print data_md5 

prints:

af1b88ca9fd8a3e828b40ed1b9a2cb20 
like image 41
Dan D. Avatar answered Sep 30 '22 10:09

Dan D.