Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Howto put object to s3 with Content-MD5

I have tried to upload an XML File to S3 using boto3. As recommended by Amazon, I would like to send a Base64 Encoded MD5-128 Bit Digest(Content-MD5) of the data.

https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.put

My Code:

with open(file, 'rb') as tempfile:
   body = tempfile.read()
tempfile.close()

hash_object = hashlib.md5(body)
base64_md5 = base64.encodebytes(hash_object.digest())

response = s3.Object(self.bucket, self.key + file).put(
            Body=body.decode(self.encoding),
            ACL='private',
            Metadata=metadata,
            ContentType=self.content_type,
            ContentEncoding=self.encoding,
            ContentMD5=str(base64_md5)
        )

When i try this the str(base64_md5) create a string like 'b'ZpL06Osuws3qFQJ8ktdBOw==\n''

In this case, I get this Error Message:

An error occurred (InvalidDigest) when calling the PutObject operation: The Content-MD5 you specified was invalid.

For Test purposes I copied only the Value without the 'b' in front: 'ZpL06Osuws3qFQJ8ktdBOw==\n'

Then i get this Error Message:

botocore.exceptions.HTTPClientError: An HTTP Client raised and unhandled exception: Invalid header value b'hvUe19qHj7rMbwOWVPEv6Q==\n'

Can anyone help me how to save Upload a File to S3?

Thanks,

Oliver

like image 908
Meschkov Avatar asked Oct 18 '18 18:10

Meschkov


2 Answers

Starting with @Isaac Fife's example, stripping it down to identify what's required vs not, and to include imports and such to make it a full reproducible example:

(the only change you need to make is to use your own bucket name)

import base64
import hashlib
import boto3

contents = "hello world!"
md = hashlib.md5(contents.encode('utf-8')).digest()
contents_md5 = base64.b64encode(md).decode('utf-8')

boto3.client('s3').put_object(
  Bucket="mybucket",
  Key="test",
  Body=contents,
  ContentMD5=contents_md5
)

Learnings: first, the MD5 you are trying to generate will NOT look like what an 'upload' returns. We actually need a base64 version, it returns a md.hexdigest() version. hex is base16, which is not base64.

like image 135
tedder42 Avatar answered Oct 19 '22 03:10

tedder42


(Python 3.7)

Took me hours to figure this out because the only error you get is "The Content-MD5 you specified was invalid." Super useful for debugging... Anyway, here is the code I used to actually get the file to upload correctly before refactoring.

json_results = json_converter.convert_to_json(result)
json_results_utf8 = json_results.encode('utf-8')
content_md5 = md5.get_content_md5(json_results_utf8)
content_md5_string = content_md5.decode('utf-8')
metadata = {
    "md5chksum": content_md5_string
}
s3 = boto3.resource('s3', config=Config(signature_version='s3v4'))
obj = s3.Object(bucket, 'filename.json')
obj.put(
    Body=json_results_utf8,
    ContentMD5=content_md5_string,
    ServerSideEncryption='aws:kms',
    Metadata=metadata,
    SSEKMSKeyId=key_id)

and the hashing

def get_content_md5(data):
    digest = hashlib.md5(data).digest()
    return base64.b64encode(digest)

The hard part for me was figuring out what encoding you need at each step in the process and not being very familiar with how strings are stored in python at the time.

get_content_md5 takes a utf-8 bytes-like object only, and returns the same. But to pass the md5 hash to aws, it needs to be a string. You have to decode it before you give it to ContentMD5.

Pro-tip - Body on the other hand, needs to be given bytes or a seekable object. Make sure if you pass a seekable object that you seek(0) to the beginning of the file before you pass it to AWS or the MD5 will not match. For that reason, using bytes is less error prone, imo.

like image 29
Isaac Fife Avatar answered Oct 19 '22 03:10

Isaac Fife