Combining MD5 hash values

Tags:

When calculating a single MD5 checksum on a large file, what technique is generally used to combine the various MD5 values into a single value? Do you just add them together? I'm not really interested in any particular language, library or API which will do this; rather I'm just interested in the technique behind it. Can someone explain how it is done?

Given the following algorithm in pseudo-code:

MD5Digest X
for each file segment F
   MD5Digest Y = CalculateMD5(F)
   Combine(X,Y)

But what exactly would Combine do? Does it add the two MD5 digests together, or what?

847

asked Feb 06 '10 18:02

channel72

2 Answers

In order to calculate MD5 values for files which are too large to fit in memory

With that in mind, you don't want to "combine" two MD5 hashes. With any MD5 implementation, you have a object that keeps the current checksum state. So you can extract the MD5 checksum at any time, which is very handy when hashing two files that share the same beginning. For big files, you just keep feeding in data - there's no difference if you hash the file at once or in blocks, as the state is remembered. In both cases you will get the same hash.

122

answered Sep 21 '22 10:09

AndiDog

MD5 is an iterative algorithm. You don't need to calculate a ton of small MD5's and then combine them somehow. You just read small chunks of the the file and add them to the digest as your're going, so you never have to have the entire file in memory at once. Here's a java implementation.

FileInputStream f = new FileInputStream(new File("bigFile.txt"));
MessageDigest digest = MessageDigest.getInstance("md5");
byte[] buffer = new byte[8192];
int len = 0;
while (-1 != (len = f.read(buffer))) {
   digest.update(buffer,0,len);
}
byte[] md5hash = digest.digest();

Et voila. You have the MD5 of an entire file without ever having the whole file in memory at once.

Its worth noting that if for some reason you do want MD5 hashes of subsections of the file as you go along (this is sometimes useful for doing interim checks on a large file being transferred over a low bandwidth connection) then you can get them by cloning the digest object at any time, like so

byte[] interimHash = ((MessageDigest)digest.clone()).digest();

This does not affect the actual digest object so you can continue to work with the overall MD5 hash.

Its also worth noting that MD5 is an outdated hash for cryptographic purposes (such as verifying file authenticity from an untrusted source) and should be replaced with something better in most circumstances, such as SHA-1. For non-cryptographic purposes, such as verifying file integrity between two trusted sources, MD5 is still adequate.

answered Sep 18 '22 10:09

Jherico

Related questions
                            
                                How to find the MD5 fingerprint of my Android App
                            
                                How to decrypt MD5 in Ruby? [duplicate]
                            
                                MD5 Hash function in excel?
                            
                                In HTTP protocol what is the difference between ETag and Content-MD5?
                            
                                Storing MD5 Hash in SQL Server
                            
                                Hash algorithm with alphanumeric output of 20 characters max
                            
                                Best MySQL data type to store MD5 hash or NULL
                            
                                How long should my password salt be, and is SHA-256 good enough?
                            
                                Android Studio: Failed to create MD5 HashFile
                            
                                How to convert password into md5 in jquery? [duplicate]
                            
                                What's faster/better to use: the MySQL or PHP md5 function?
                            
                                Convert MD5 array to String java
                            
                                6 Character Short Hash Algorithm
                            
                                How to Generate an MD5 hash in Kotlin? [closed]
                            
                                MD5 implementation in PHP - where am I going wrong?
                            
                                convert String to MD5
                            
                                How to md5 all columns regardless of type
                            
                                get back a string representation from computeDigest(algorithm, value) byte[]
                            
                                How can I do a binary encoding of a string in python?
                            
                                Python, get base64-encoded MD5 hash of an image object

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Combining MD5 hash values

Tags:

algorithm

md5

checksum

channel72

People also ask

2 Answers

AndiDog

Jherico

Recent Activity

Donate For Us