I want to compute an md5 hash not of a string, but of an entire data structure. I understand the mechanics of a way to do this (dispatch on the type of the value, canonicalize dictionary key order and other randomness, recurse into sub-values, etc). But it seems like the kind of operation that would be generally useful, so I'm surprised I need to roll this myself.
Is there some simpler way in Python to achieve this?
UPDATE: pickle has been suggested, and it's a good idea, but pickling doesn't canonicalize dictionary key order:
>>> import cPickle as pickle >>> import hashlib, random >>> for i in range(10): ... k = [i*i for i in range(1000)] ... random.shuffle(k) ... d = dict.fromkeys(k, 1) ... p = pickle.dumps(d) ... print hashlib.md5(p).hexdigest() ... 51b5855799f6d574c722ef9e50c2622b 43d6b52b885f4ecb4b4be7ecdcfbb04e e7be0e6d923fe1b30c6fbd5dcd3c20b9 aebb2298be19908e523e86a3f3712207 7db3fe10dcdb70652f845b02b6557061 43945441efe82483ba65fda471d79254 8e4196468769333d170b6bb179b4aee0 951446fa44dba9a1a26e7df9083dcadf 06b09465917d3881707a4909f67451ae 386e3f08a3c1156edd1bd0f3862df481
Each MD5 hash looks like 32 numbers and letters, but each digit is in hexadecimal and represents four bits. Since a single character represents eight bits (to form a byte), the total bit count of an MD5 hash is 128 bits. Two hexadecimal characters form a byte, so 32 hexadecimal characters equal 16 bytes.
Call MessageDigest. getInstance("MD5") to get a MD5 instance of MessageDigest you can use. The compute the hash by doing one of: Feed the entire input as a byte[] and calculate the hash in one operation with md.
What is the MD5 Algorithm? MD5 (Message Digest Method 5) is a cryptographic hash algorithm used to generate a 128-bit digest from a string of any length. It represents the digests as 32 digit hexadecimal numbers. Ronald Rivest designed this algorithm in 1991 to provide the means for digital signature verification.
MD5 is a type of algorithm that is known as a cryptographic hash algorithm. MD5 produces a hash value in a hexadecimal format. This competes with other designs where hash functions take in a certain piece of data, and change it to provide a key or value that can be used in place of the original value.
json.dumps() can sort dictionaries by key. So you don't need other dependencies:
import hashlib import json data = ['only', 'lists', [1,2,3], 'dictionaries', {'a':0,'b':1}, 'numbers', 47, 'strings'] data_md5 = hashlib.md5(json.dumps(data, sort_keys=True).encode('utf-8')).hexdigest() print(data_md5)
Prints:
87e83d90fc0d03f2c05631e2cd68ea02
bencode sorts dictionaries so:
import hashlib import bencode data = ['only', 'lists', [1,2,3], 'dictionaries', {'a':0,'b':1}, 'numbers', 47, 'strings'] data_md5 = hashlib.md5(bencode.bencode(data)).hexdigest() print data_md5
prints:
af1b88ca9fd8a3e828b40ed1b9a2cb20
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With