I get the MD5 of a local file but it is different than the MD5 (eTag) of the "same" file in Amazon S3. What I would like to achieve is figure out if the lastest files I have in S3 is the same one that I have locally. If I cannot compare MD5, then how should I do it?
Generating MD5 from the local file (truncated code):
MessageDigest md = MessageDigest.getInstance("MD5");
byte[] md5 = Files.getDigest(localFile, md);
String hashtext = DigestUtils.md5Hex(md5);
Retrieving MD5 (eTag) from S3 (truncated code):
ObjectListing objectListing = s3.listObjects(new ListObjectsRequest().withBucketName(bucketName));
List<S3ObjectSummary> objectSummaries = objectListing.getObjectSummaries();
for(S3ObjectSummary objectSummary : objectSummaries) {
String MD5 = objectSummary.getETag();
}
PS: I use org.apache.commons.codec.digest.DigestUtils
and com.google.common.io.Files
libraries.
String hashtext = DigestUtils.md5Hex(md5);
Does calculate the MD5 of the MD5 you just calculated. See DigestUtils.md5Hex documentation.
hashtext
is in fact MD5(MD5(file)) and not MD5(file).
Bruno's answer nails it, but I wanted to point out that if you want to do this without the Google Guava dependency, it's actually not that difficult (especially since/if you're already using Apache Commons)
You'd replace this:
byte[] md5 = Files.getDigest(localFile, md);
with this (using a Java 7 try-initialization-block):
try (FileInputStream fis = new FileInputStream(localFile)) {
byte[]md5 = DigestUtils.md5(fileInputStream);
}
This md5(InputStream) method has been in Apache Commons since version 1.4.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With