Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I set the content md5 when I upload a file to S3?

I am trying to set the content-MD5 value when I upload a file to S3. I can see the md5 hash string and am passing that into metadata.setContentMD5() but after the file is uploaded, I can't see this value in the web console, and I can't retrieve it via java code.

I've come to think that it's likely I'm misunderstanding the goal of the content MD5 get/set methods. Are they used to let the aws server validate that the received file content is consistent with what I am sending? If that's the case then I should send in a value with setContentMD5(my_md5) when uploading, but should I then just compare the value of getETag() with my calculated md5 hex string when I later try to download that object from S3?

Am I doing something wrong in trying to set this md5 value?

String access_key = "myaccesskey";
String secret_key = "mysecretkey";
String bucket_name = "mybucketname";
String destination_key = "md5_test.txt";
String file_path = "C:\\my-text-file.txt";

BasicAWSCredentials creds = new BasicAWSCredentials(access_key, secret_key);
AmazonS3Client client = new AmazonS3Client(creds);
client.setRegion(RegionUtils.getRegion("us-east-1"));

File file = new File(file_path);

ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentType("text/plain");
metadata.setContentLength(file.length());

FileInputStream fis = new FileInputStream(file);
byte[] content_bytes = IOUtils.toByteArray(fis);
String md5 = new String(Base64.encodeBase64(DigestUtils.md5(content_bytes)));
metadata.setContentMD5(md5);

PutObjectRequest req = new PutObjectRequest(bucket_name, destination_key, file).withMetadata(metadata);
PutObjectResult result = client.putObject(req);

GetObjectMetadataRequest mreq = new GetObjectMetadataRequest(bucket_name, destination_key);
ObjectMetadata retrieved_metadata = client.getObjectMetadata(mreq);

// I think I expected getContentMD5 below to show the string I passed in
// during the upload, but the below prints "md5:null"
System.out.println("md5:" + retrieved_metadata.getContentMD5());

Am I calculating the MD5 string incorrectly? If I pass in a random string, I do get an error message, so it seems like S3 is happy with what I am sending via the above code. And if the MD5 string is correct, why can't I retrieve it later when using the client.getContentMD5() method? I understand that ETag should be the MD5 hex string, and I can also calculate that for my uploaded file (and get the same string that S3 calculates), so is it the case that I shouldn't expect the getContentMD5() to ever have a value for a downloaded file?

like image 576
Chris Farmer Avatar asked Feb 14 '16 21:02

Chris Farmer


People also ask

What is MD5 in S3?

When you use PutObject to upload objects to Amazon S3, pass the Content-MD5 value as a request header. Amazon S3 checks the object against the provided Content-MD5 value. If the values do not match, you receive an error. The Content-MD5 request header can also be used with the S3 UploadPart API.

Is S3 ETag MD5?

Files uploaded to Amazon S3 that are smaller than 5GB have an ETag that is simply the MD5 hash of the file, which makes it easy to check if your local files are the same as what you put on S3.

What is checksum in S3 bucket?

Amazon S3 uses checksum values to verify the integrity of data that you upload to or download from Amazon S3. In addition, you can request that another checksum value be calculated for any object that you store in Amazon S3.


2 Answers

I think you are correct: getContentMD5() is just the corresponding getter for setContentMD5() 1. It tells you what the callee side of the request thinks the MD5 hash is. If you want to know what AWS thinks the hash is, you should use the ETag.

getContentMD5

This field represents the base64 encoded 128-bit MD5 digest digest of an object's content as calculated on the caller's side. The ETag metadata field represents the hex encoded 128-bit MD5 digest as computed by Amazon S3.

Returns: The base64 encoded MD5 hash of the content for the associated object. Returns null if the MD5 hash of the content hasn't been set.

That last part presumably means: Returns null unless you have previously called setContentMD5()

like image 75
Brian R Armstrong Avatar answered Sep 22 '22 08:09

Brian R Armstrong


You do not need to pass a MD5 string, but if it is provided, Amazon will use it to validate the transmission and make sure what it received is not corrupted.

MD5 is only meaningful during the transmission and its life cycle stops once the transmission is received and validated. To persist it on the server side serves no purpose.

The getter is merely to make the API complete, so you can inspect what you did earlier using the setter.

like image 28
Peter Pei Guo Avatar answered Sep 19 '22 08:09

Peter Pei Guo