Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Computing MD5SUM of large files in C#

I am using following code to compute MD5SUM of a file -

byte[] b = System.IO.File.ReadAllBytes(file);
string sum = BitConverter.ToString(new MD5CryptoServiceProvider().ComputeHash(b));

This works fine normally, but if I encounter a large file (~1GB) - e.g. an iso image or a DVD VOB file - I get an Out of Memory exception.

Though, I am able to compute the MD5SUM in cygwin for the same file in about 10secs.

Please suggest how can I get this to work for big files in my program.

Thanks

like image 323
spkhaira Avatar asked Apr 30 '09 07:04

spkhaira


People also ask

How is md5sum calculated?

Type the following command: md5sum [type file name with extension here] [path of the file] -- NOTE: You can also drag the file to the terminal window instead of typing the full path. Hit the Enter key. You'll see the MD5 sum of the file. Match it against the original value.

What is the difference between MD5 and md5sum?

The md5sum command is based on the MD5 algorithm and generates 128-bit message digests. The md5sum command enables you to verify the integrity of files downloaded over a network connection. You can also use the md5sum command to compare files and verify the integrity of files.

What is MD5 size?

The hash size for the MD5 algorithm is 128 bits. The ComputeHash methods of the MD5 class return the hash as an array of 16 bytes. Note that some MD5 implementations produce a 32-character, hexadecimal-formatted hash.

Can two files generate same checksum?

Generally, two files can have the same md5 hash only if their contents are exactly the same. Even a single bit of variation will generate a completely different hash value. There is one caveat, though: An md5 sum is 128 bits (16 bytes).


1 Answers

I suggest using the alternate method:

MD5CryptoServiceProvider.ComputeHash(Stream)

and just pass in an input stream opened on your file. This method will almost certainly not read in the whole file in memory in one go.

I would also note that in most implementations of MD5 it's possible to add byte[] data into the digest function a chunk at a time, and then ask for the hash at the end.

like image 200
Alnitak Avatar answered Sep 19 '22 15:09

Alnitak