Faster MD5 alternative?

Tags:

I'm working on a program that searches entire drives for a given file. At the moment, I calculate an MD5 hash for the known file and then scan all files recursively, looking for a match.

The only problem is that MD5 is painfully slow on large files. Is there a faster alternative that I can use while retaining a very small probablity of false positives?

All code is in C#.

Thank you.

Update

I've read that even MD5 can be pretty quick and that disk I/O should be the limiting factor. That leads me to believe that my code might not be optimal. Are there any problems with this approach?

        MD5 md5 = MD5.Create();
        StringBuilder sb = new StringBuilder();
        try
        {
            using (FileStream fs = File.Open(fileName, FileMode.Open, FileAccess.Read))
            {
                foreach (byte b in md5.ComputeHash(fs))
                    sb.Append(b.ToString("X2"));
            }
            return sb.ToString();
        }
        catch (Exception)
        {
            return "";
        }

948

asked Nov 13 '08 23:11

Paul Beesley

1 Answers

I hope you're checking for an MD5 match only if the file size already matches.

Another optimization is to do a quick checksum of the first 1K (or some other arbitrary, but reasonably small number) and make sure those match before working the whole file.

Of course, all this assumes that you're just looking for a match/nomatch decision for a particular file.

184

answered Sep 18 '22 13:09

Michael Burr

Related questions
                            
                                How do I create a List of Dictionaries in .NET?
                            
                                C#: Formatting Price value string
                            
                                Generating Permutations using LINQ
                            
                                WPF forcing redraw of canvas
                            
                                Cookie loses value in ASP.net
                            
                                Why is there a List<T>.BinarySearch(...)?
                            
                                Should I always wrap my code in try...catch blocks? [duplicate]
                            
                                For loop to calculate factorials
                            
                                A to Z list of char from Enumerable.Range
                            
                                Add backslash to string
                            
                                Is there any point Unit testing serialization?
                            
                                Unable to get queue length / message count from Azure
                            
                                Source control for Visual Studio that doesn't require a server?
                            
                                Accessing User.Identity from Master Page
                            
                                Entity Framework Deadlocks
                            
                                Creating a new log file each day
                            
                                How to get rid of StyleCop
                            
                                WebProxy error: Proxy Authentication Required
                            
                                I want "(int)null" to return me 0
                            
                                ContextMenuOpening event not firing in WPF?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Faster MD5 alternative?

Tags:

c#

hash

md5

Paul Beesley

People also ask

1 Answers

Michael Burr

Recent Activity

Donate For Us