Referred Posts: Amazon S3 & Checksum, How to encode md5 sum into base64 in BASH
I have to download a tar file from S3 bucket with limited access. [ Mostly access permissions given only to download ]
After I download I have to check the md5 check sum of the downloaded file against the MD5-Check Sum of the data present as metadata in S3
I currently use a S3 file browser to manually note the "x-amz-meta-md5" of the content header and validate that value against the computed md5 of the downloaded file.
I would like to know if there is programmatic way using boto to capture the md5 hash value of a S3 file as mentioned as metadata.
from boto.s3.connection import S3Connection
conn = S3Connection(access_key, secret_key)
bucket=conn.get_bucket("test-bucket")
rs_keys = bucket.get_all_keys()
for key_val in rs_keys:
print key_val, key_val.**HOW_TO_GET_MD5_FROM_METADATA(?)**
Please correct if my understanding is wrong. I am looking for a way to capture the header data programmatically
Open a terminal window. Type the following command: md5sum [type file name with extension here] [path of the file] -- NOTE: You can also drag the file to the terminal window instead of typing the full path. Hit the Enter key. You'll see the MD5 sum of the file.
Each file on S3 gets an ETag, which is essentially the md5 checksum of that file.
For Non-multipart: The ETag is simply the textual representation of the MD5 checksum of the file.
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers scalability, data availability, security, and performance.
When boto downloads a file using any of the get_contents_to_*
methods, it computes the MD5 checksum of the bytes it downloads and makes that available as the md5
attribute of the Key
object. In addition, S3 sends an ETag
header in the response that represents the server's idea of what the MD5 checksum is. This is available as the etag
attribute of the Key
object. So, after downloading a file you could just compare the value of those two attributes to see if they match.
If you want to find out what S3 thinks the MD5 is without actually downloading the file (as shown in your example) you could just do this:
for key_val in rs_keys:
print key_val, key_val.etag
It seems well established that the ETag is not the md5sum if the file was assembled after running a multi-part upload. I think in that case one's only recourse is to download the file and perform a checksum locally. If the result is correct, the S3 copy must be good. If the local checksum is wrong, the s3 copy may be bad, or the download might have failed. If you no longer have the original file or a record of its md5sum, I think you're out of luck. It would be great if the md5sum of the assembled file were available, or if there were a way to locally compute the expected etag of a file to be uploaded via multipart.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With