Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faster MD5 alternative?

Tags:

c#

hash

md5

I'm working on a program that searches entire drives for a given file. At the moment, I calculate an MD5 hash for the known file and then scan all files recursively, looking for a match.

The only problem is that MD5 is painfully slow on large files. Is there a faster alternative that I can use while retaining a very small probablity of false positives?

All code is in C#.

Thank you.

Update

I've read that even MD5 can be pretty quick and that disk I/O should be the limiting factor. That leads me to believe that my code might not be optimal. Are there any problems with this approach?

        MD5 md5 = MD5.Create();
        StringBuilder sb = new StringBuilder();
        try
        {
            using (FileStream fs = File.Open(fileName, FileMode.Open, FileAccess.Read))
            {
                foreach (byte b in md5.ComputeHash(fs))
                    sb.Append(b.ToString("X2"));
            }
            return sb.ToString();
        }
        catch (Exception)
        {
            return "";
        }
like image 948
Paul Beesley Avatar asked Nov 13 '08 23:11

Paul Beesley


People also ask

What is faster than MD5?

Keccak (SHA-3), Skein, and BLAKE2 are all reasonable choices. BLAKE2 is not only faster than the other good hash functions, it is even faster than MD5 or SHA-1 (on modern Intel CPUs).

What should I use instead of MD5?

Probably the one most commonly used is SHA-256, which the National Institute of Standards and Technology (NIST) recommends using instead of MD5 or SHA-1. The SHA-256 algorithm returns hash value of 256-bits, or 64 hexadecimal digits.

Is SHA-256 faster than MD5?

MD5 is known to be generally faster than SHA256 .

Is CRC32 faster than MD5?

If you want to check if two files are the same, CRC32 checksum is the way to go because it's faster than MD5.


1 Answers

I hope you're checking for an MD5 match only if the file size already matches.

Another optimization is to do a quick checksum of the first 1K (or some other arbitrary, but reasonably small number) and make sure those match before working the whole file.

Of course, all this assumes that you're just looking for a match/nomatch decision for a particular file.

like image 184
Michael Burr Avatar answered Sep 18 '22 13:09

Michael Burr