Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I compute two hashes without reading the same file twice?

Tags:

c#

hash

md5

sha1

I have a program which is going to be used on very large files (current test data is 250GB). I need to be able to calculate both MD5 and SHA1 hashes for these files. Currently my code drops the stream into MD5.Create().ComputeHash(Stream stream), and then the same for SHA1. These, as far as I can tell, read the file in 4096-byte blocks to a buffer internal to the hashing function, until the end of the stream.

The problem is, doing this one after the other takes a VERY long time! Is there any way I can take data into a buffer and provide the buffer to BOTH algorithms before reading a new block into the buffer?

Please explain thoroughly as I'm not an experienced coder.

like image 868
Joash Lewis Avatar asked Dec 27 '22 09:12

Joash Lewis


1 Answers

Sure. You can call TransformBlock repeatedly, and then TransformFinalBlock at the end and then use Hash to get the final hash. So something like:

using (var md5 = MD5.Create()) // Or MD5Cng.Create
using (var sha1 = SHA1.Create()) // Or SHA1Cng.Create
using (var input = File.OpenRead("file.data"))
{
    byte[] buffer = new byte[8192];
    int bytesRead;
    while ((bytesRead = input.Read(buffer, 0, buffer.Length()) > 0)
    {
        md5.TransformBlock(buffer, 0, bytesRead, buffer, 0);
        sha1.TransformBlock(buffer, 0, bytesRead, buffer, 0);
    }
    // We have to call TransformFinalBlock, but we don't have any
    // more data - just provide 0 bytes.
    md5.TransformFinalBlock(buffer, 0, 0, buffer, 0);
    sha1.TransformFinalBlock(buffer, 0, 0, buffer, 0);

    byte[] md5Hash = md5.Hash;
    byte[] sha1Hash = sha1.Hash;
}

The MD5Cng.Create and SHA1Cng.Create calls will create wrappers around native implementations which are likely to be faster than the implementations returned by MD5.Create and SHA1.Create, but which will be a bit less portable (e.g. for PCLs).

like image 159
Jon Skeet Avatar answered Jan 19 '23 00:01

Jon Skeet