Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Possible to calculate MD5 (or other) hash with buffered reads?

I need to calculate checksums of quite large files (gigabytes). This can be accomplished using the following method:

    private byte[] calcHash(string file)     {         System.Security.Cryptography.HashAlgorithm ha = System.Security.Cryptography.MD5.Create();         FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read);         byte[] hash = ha.ComputeHash(fs);         fs.Close();         return hash;     } 

However, the files are normally written just beforehand in a buffered manner (say writing 32mb's at a time). I am so convinced that I saw an override of a hash function that allowed me to calculate a MD5 (or other) hash at the same time as writing, ie: calculating the hash of one buffer, then feeding that resulting hash into the next iteration.

Something like this: (pseudocode-ish)

byte [] hash = new byte [] { 0,0,0,0,0,0,0,0 }; while(!eof) {    buffer = readFromSourceFile();    writefile(buffer);    hash = calchash(buffer, hash); } 

hash is now sililar to what would be accomplished by running the calcHash function on the entire file.

Now, I can't find any overrides like that in the.Net 3.5 Framework, am I dreaming ? Has it never existed, or am I just lousy at searching ? The reason for doing both writing and checksum calculation at once is because it makes sense due to the large files.

like image 397
sindre j Avatar asked Jan 23 '10 19:01

sindre j


People also ask

Can two files generate same checksum?

Generally, two files can have the same md5 hash only if their contents are exactly the same. Even a single bit of variation will generate a completely different hash value.

What is MD5 checksum value?

An MD5 checksum is a 32-character hexadecimal number that is computed on a file. If two files have the same MD5 checksum value, then there is a high probability that the two files are the same. After downloading an Altera software installation package, you can compute the MD5 checksum on the installation file.


2 Answers

You use the TransformBlock and TransformFinalBlock methods to process the data in chunks.

// Init MD5 md5 = MD5.Create(); int offset = 0;  // For each block: offset += md5.TransformBlock(block, 0, block.Length, block, 0);  // For last block: md5.TransformFinalBlock(block, 0, block.Length);  // Get the has code byte[] hash = md5.Hash; 

Note: It works (at least with the MD5 provider) to send all blocks to TransformBlock and then send an empty block to TransformFinalBlock to finalise the process.

like image 133
Guffa Avatar answered Sep 28 '22 02:09

Guffa


I like the answer above but for the sake of completeness, and being a more general solution, refer to the CryptoStream class. If you are already handling streams, it is easy to wrap your stream in a CryptoStream, passing a HashAlgorithm as the ICryptoTransform parameter.

var file = new FileStream("foo.txt", FileMode.Open, FileAccess.Write); var md5 = MD5.Create(); var cs = new CryptoStream(file, md5, CryptoStreamMode.Write); while (notDoneYet) {     buffer = Get32MB();     cs.Write(buffer, 0, buffer.Length); } System.Console.WriteLine(BitConverter.ToString(md5.Hash)); 

You might have to close the stream before getting the hash (so the HashAlgorithm knows it's done).

like image 29
pomeroy Avatar answered Sep 28 '22 03:09

pomeroy