Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursive MD5 and probability of collision

I wonder if it is 'safe' to hash a bunch of MD5 hash values together to create a new hash or whether this will in any way increase the probability of collisions.

The background: I have a couple of files with dependencies. Each file has an associated hash value which is calculated based on it's content. Let's call this the 'single-file' hash value. In addition to this, the file should also have a hash value which includes all the dependent files, the 'multi-file' hash value.

So the question is: Can I just take all the single-file MD5 hash values of the dependent files, concatenate them and then calculate an MD5 over the concatenated values to get the multi-file hash value. Or will this result in an MD5 hash that is more likely to collide than if I would concatenate the content of all dependent files together.

Alternatively, could I xor the single-file hash values together to generate a multi-file hash value, or would this likely result in more collisions?

like image 459
Janick Bernet Avatar asked Sep 18 '11 12:09

Janick Bernet


2 Answers

Sounds like you need a Merkel Tree

like image 69
James Avatar answered Oct 07 '22 11:10

James


MD5 has a lot of collision problems, see MD5 entry on Wikipedia.

However, if you use MD5 not for security but as a unique marker to check dependencies, even hashing contatenated hashes should be pretty safe.

Or, if it's not too late, switch to SHA-1.

like image 38
squadette Avatar answered Oct 07 '22 10:10

squadette