How to check if local file is same as file stored in S3 without downloading it? To avoid downloading large files again and again. S3 objects have e-tags, but they are difficult to compute if file was uploaded in parts and solution from this question doesn't seem to work. Is there some easier way avoid unnecessary downloads?
Reading objects without downloading them Similarly, if you want to upload and read small pieces of textual data such as quotes, tweets, or news articles, you can do that using the S3 resource method put(), as demonstrated in the example below (Gist).
I would just compare the last modified time and download if they are different. Additionally you can also compare the size before downloading. Given a bucket
, key
and a local file fname
:
import boto3
import os.path
def isModified(bucket, key, fname):
s3 = boto3.resource('s3')
obj = s3.Object(bucket, key)
return int(obj.last_modified.strftime('%s')) != int(os.path.getmtime(fname))
Can you use a small local database, e.g. a text file?
Next time, before you proceed with downloading, look up the ETag in the 'database'. If it's there, compute the signature of your existing file, and compare with the signature corresponding to the ETag. If they match, the remote file is the same that you have.
There's a possibility that the same file will be re-uploaded with different chunking, thus changing the ETag. Unless this is very probable, you can just ignore the false negative and re-download the file in that rare case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With