Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How likely are md5 false positive checksums?

Tags:

md5

checksum

I have a client who is distributing large binary files internally. They are also passing md5 checksums of the files and apparently verifying the files against the checksum before use as part of their workflow.

However they claim that "often" they are encountering corruption in the files where the md5 is still saying that the file is good.

Everything I've read suggests that this should be hugely unlikely.

Does this sound likely? Would another hashing algorithm provide better results? Should I actually be looking at process problems such as them claiming to check the checksum, but not really doing it?

NB, I don't yet know what "often" means in this context. They are processing hundreds of files a day. I don't know if this is a daily, monthly or yearly occurrence.

like image 814
Gareth Simpson Avatar asked Feb 07 '11 23:02

Gareth Simpson


People also ask

How likely is an MD5 collision?

MD5: The fastest and shortest generated hash (16 bytes). The probability of just two hashes accidentally colliding is approximately: 1.47*10-29.

Is MD5 good for checksum?

A checksum algorithm in this scenario only needs to be 'good enough' to detect unintentional changes to the data. For example, MD5 is perfectly suitable - it is a very widely adopted, there is good tool support, and checksums are quick to generate and compare.

Can MD5 be faked?

Yes. Very yes. MD5 is completely broken and no longer suitable for security use. See wikipedia for a description of ways you can abuse MD5.

Is MD5 same as checksum?

They are not the same thing. MD5 is a checksum but there are other checksum algorithms that are not MD5, such as SHA, CRC etc. Generally a checksum is a function that takes an input that's larger in size than its output and (it better) produces greatly different outputs even if one bit in the input is changed.


1 Answers

Sounds like a bug in their use of MD5 (maybe they are MD5-ing the wrong files), or a bug in the library that they're using. For example, an older MD5 program that I used once didn't handle files over 2GB.

This question suggests that, on average, you get a collision on average every 100 years if you were generating 6 billion files per second, so it's quite unlikely.

like image 150
Seth Avatar answered Oct 26 '22 05:10

Seth