I am looking to use java or groovy to get the md5 checksum of a complete directory.
I have to copy directories for source to target, checksum source and target, and after delete source directories.
I find this script for files, but how to do the same thing with directories ?
import java.security.MessageDigest
def generateMD5(final file) {
MessageDigest digest = MessageDigest.getInstance("MD5")
file.withInputStream(){ is ->
byte[] buffer = new byte[8192]
int read = 0
while( (read = is.read(buffer)) > 0) {
digest.update(buffer, 0, read);
}
}
byte[] md5sum = digest.digest()
BigInteger bigInt = new BigInteger(1, md5sum)
return bigInt.toString(16).padLeft(32, '0')
}
Is there a better approach ?
I had the same requirement and chose my 'directory hash' to be an MD5 hash of the concatenated streams of all (non-directory) files within the directory. As crozin mentioned in comments on a similar question, you can use SequenceInputStream
to act as a stream concatenating a load of other streams. I'm using Apache Commons Codec for the MD5 algorithm.
Basically, you recurse through the directory tree, adding FileInputStream
instances to a Vector
for non-directory files. Vector
then conveniently has the elements()
method to provide the Enumeration
that SequenceInputStream
needs to loop through. To the MD5 algorithm, this just appears as one InputStream
.
A gotcha is that you need the files presented in the same order every time for the hash to be the same with the same inputs. The listFiles()
method in File
doesn't guarantee an ordering, so I sort by filename.
I was doing this for SVN controlled files, and wanted to avoid hashing the hidden SVN files, so I implemented a flag to avoid hidden files.
The relevant basic code is as below. (Obviously it could be 'hardened'.)
import org.apache.commons.codec.digest.DigestUtils;
import java.io.*;
import java.util.*;
public String calcMD5HashForDir(File dirToHash, boolean includeHiddenFiles) {
assert (dirToHash.isDirectory());
Vector<FileInputStream> fileStreams = new Vector<FileInputStream>();
System.out.println("Found files for hashing:");
collectInputStreams(dirToHash, fileStreams, includeHiddenFiles);
SequenceInputStream seqStream =
new SequenceInputStream(fileStreams.elements());
try {
String md5Hash = DigestUtils.md5Hex(seqStream);
seqStream.close();
return md5Hash;
}
catch (IOException e) {
throw new RuntimeException("Error reading files to hash in "
+ dirToHash.getAbsolutePath(), e);
}
}
private void collectInputStreams(File dir,
List<FileInputStream> foundStreams,
boolean includeHiddenFiles) {
File[] fileList = dir.listFiles();
Arrays.sort(fileList, // Need in reproducible order
new Comparator<File>() {
public int compare(File f1, File f2) {
return f1.getName().compareTo(f2.getName());
}
});
for (File f : fileList) {
if (!includeHiddenFiles && f.getName().startsWith(".")) {
// Skip it
}
else if (f.isDirectory()) {
collectInputStreams(f, foundStreams, includeHiddenFiles);
}
else {
try {
System.out.println("\t" + f.getAbsolutePath());
foundStreams.add(new FileInputStream(f));
}
catch (FileNotFoundException e) {
throw new AssertionError(e.getMessage()
+ ": file should never not be found!");
}
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With