There is a scenario where I need to verify the checksum(md5) of a file stored in s3 bucket. This can be achieved when uploading the file by specifying the checksum value in the metadata of api call. But in my case, I wanted to verify the checksum after put the data into bucket programmatically. Every object in S3 will have attribute called 'ETag' which is the md5 checksum calculated by S3.
Is there anyway to get the ETag of a specific object and compare the checksum of both local file & file stored in s3 using boto3 client in a python script?
ETag. The entity tag is a hash of the object. The ETag reflects changes only to the contents of an object, not its metadata. The ETag may or may not be an MD5 digest of the object data.
Files uploaded to Amazon S3 that are smaller than 5GB have an ETag that is simply the MD5 hash of the file, which makes it easy to check if your local files are the same as what you put on S3.
Boto3 is the official AWS SDK for Python, used to create, configure, and manage AWS services. The following are examples of defining a resource/client in boto3 for the Weka S3 service, managing credentials, and pre-signed URLs, generating secure temporary tokens, and using those to run S3 API calls.
In the Amazon S3 console, choose your S3 bucket, choose the file that you want to open or download, choose Actions, and then choose Open or Download. If you are downloading an object, specify where you want to save it. The procedure for saving the object depends on the browser and operating system that you are using.
Boto3 api has provided a way to get the metadata of an object stored in s3. The following snippet will help to get the metadata via programmatically :
>>> s3_cli = boto3.client('s3')
>>> s3_resp = s3_cli.head_object(Bucket='ventests3', Key='config/ctl.json')
>>> print pprint.pprint(s3_resp)
>>> pp.pprint(s3_resp)
{u'AcceptRanges': 'bytes',
u'ContentLength': 4325,
u'ContentType': 'binary/octet-stream',
u'ETag': '"040c003386f1e2001816d32f2125d07a"',
u'LastModified': datetime.datetime(2018, 9, 20, 7, 15, 3, tzinfo=tzutc()),
u'Metadata': {},
'ResponseMetadata': {'HTTPHeaders': {'accept-ranges': 'bytes',
'content-length': '4325',
'content-type': 'binary/octet-stream',
'date': 'Thu, 20 Sep 2018 07:20:53 GMT',
'etag': '"040c003386f1e2001816d32f2125d07a"',
'last-modified': 'Thu, 20 Sep 2018 07:15:03 GMT',
'server': 'AmazonS3',
'x-amz-id-2': 'P2wapOciWCKPfol2sBgoo11tRdr4KwKcDJ/nHW7LZn00mvKfMYyfAPPV2tIcf3Vu+lrV57NBARY=',
'x-amz-request-id': '42AF970E7C9AA18C'},
'HTTPStatusCode': 200,
'HostId': 'P2wapOciWCKPfol2sBgoo11tRdr4KwKcDJ/nHW7LZn00mvKfMYyfAPPV2tIcf3Vu+lrV57NBARY=',
'RequestId': '42AF970E7C9AA18C',
'RetryAttempts': 0}}
>>> s3obj_etag = s3_resp['ETag'].strip('"')
>>> print s3obj_etag
'040c003386f1e2001816d32f2125d07a'
The head_object() method in s3 client object will fetch the metadata (headers) of a given object stored in the s3 bucket.
Do not use the ETag of an object in a bucket to determine object equivalence for an object in another bucket (with the same key). In some experiments, I found for large objects the ETag is not consistent from region to region. For example, a large file in a bucket in us-east-1 may have a different ETag when it is copied to us-east-2. The consistency of the ETag value from bucket to bucket varies from object to object. I saw where some large objects do have the same ETag in both regions. I resorted to creating my own tags with the md5sum in it and when I copy an object from one bucket to another, I also copy the tags.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With