Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is any substring of a hash (md5, sha1) more "random" than another?

Here's 3 example md5 hashes

$ md5 -s "1" && md5 -s "2" && md5 -s "3"
MD5 ("1") = c4ca4238a0b923820dcc509a6f75849b
MD5 ("2") = c81e728d9d4c2f636f067f89cc14862c
MD5 ("3") = eccbc87e4b5ce2fe28308fd9f2a7baf3

Say I wanted to take 8 characters from any hash. Is the beginning part of the hash particularly more "random" than the end? middle? Or are all substrings equally "random"?

like image 937
maček Avatar asked Sep 29 '10 07:09

maček


3 Answers

I was curious myself, so I went ahead and wrote a program to test this. You'll need Crypto++ to compile the code.

Disclaimer: When it comes to cryptography, or even just mathematics in general, I know just enough to shoot myself in the foot. So, take the following results with a grain of salt and keep in mind that I only have a cursory knowledge of the tools I'm using.

I only sampled three substrings: the first 8 bytes, the middle 8 bytes, and the last 8 bytes. Long story short, they're equally random.

However, when using a smaller sample space, it appears as if the last 8 bits are slightly more random. The larger the sampling space, the closer all three substrings approach complete randomness.


1000 iterations:

First:  0.995914
Middle: 0.996546
Last:   0.998104

5000 iterations:

First:  0.998387
Middle: 0.998624
Last:   0.999501

10000 iterations:

First:  0.999614
Middle: 0.999457
Last:   1

30000 iterations:

First:  1
Middle: 1
Last:   1

"Randomness" is measured by Crypto++'s MaurerRandomnessTest class. For reference, the executable compiled from the above code has a randomness value of 0.632411 and a copy of Shakespeare's Macbeth downloaded from Project Gutenburg has a randomness value of 0.566991.

like image 86
kurige Avatar answered Oct 19 '22 08:10

kurige


Nitpick: "random" is the wrong word to use here, since hash functions are deterministic.

As for answering what you mean :), a desirable property of hash functions is achieving the Avalanche effect: basically, to have every bit of input cause drastic changes to the output. So, for a well-designed hash, every substring should be affected equally often ("be as random") as any other.

like image 14
snemarch Avatar answered Oct 19 '22 10:10

snemarch


All substrings of a good hash (and md5 is reasonably good despite being cryptographically unsafe) are equally random, so yes, take any bits you like from the string, they should be equally distributed.

like image 12
Gintautas Miliauskas Avatar answered Oct 19 '22 10:10

Gintautas Miliauskas