Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The MD5 from a local file and the MD5 (eTag) from S3 is not the same

I get the MD5 of a local file but it is different than the MD5 (eTag) of the "same" file in Amazon S3. What I would like to achieve is figure out if the lastest files I have in S3 is the same one that I have locally. If I cannot compare MD5, then how should I do it?

Generating MD5 from the local file (truncated code):

MessageDigest md = MessageDigest.getInstance("MD5");
byte[] md5 = Files.getDigest(localFile, md);
String hashtext = DigestUtils.md5Hex(md5);

Retrieving MD5 (eTag) from S3 (truncated code):

ObjectListing objectListing = s3.listObjects(new ListObjectsRequest().withBucketName(bucketName));
List<S3ObjectSummary> objectSummaries = objectListing.getObjectSummaries();
for(S3ObjectSummary objectSummary : objectSummaries) {
    String MD5 = objectSummary.getETag();
}

PS: I use org.apache.commons.codec.digest.DigestUtils and com.google.common.io.Files libraries.

like image 237
okysabeni Avatar asked Jun 06 '11 18:06

okysabeni


2 Answers

String hashtext = DigestUtils.md5Hex(md5);

Does calculate the MD5 of the MD5 you just calculated. See DigestUtils.md5Hex documentation.

hashtext is in fact MD5(MD5(file)) and not MD5(file).

like image 102
Bruno Rohée Avatar answered Sep 28 '22 04:09

Bruno Rohée


Bruno's answer nails it, but I wanted to point out that if you want to do this without the Google Guava dependency, it's actually not that difficult (especially since/if you're already using Apache Commons)

You'd replace this:

byte[] md5 = Files.getDigest(localFile, md);

with this (using a Java 7 try-initialization-block):

try (FileInputStream fis = new FileInputStream(localFile)) {
    byte[]md5 = DigestUtils.md5(fileInputStream);
}

This md5(InputStream) method has been in Apache Commons since version 1.4.

like image 28
Eyal Avatar answered Sep 28 '22 04:09

Eyal