Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I download an S3 file only if it has changed?

I have a 900 MB file that I'd like to download to disk from S3 if it isn't already in place downloaded. Is there an easy way for me to only download the file if it isn't already in place? I know S3 supports querying MD5 checksum of files, but I'm hoping not to have to build this logic myself.

like image 376
Ztyx Avatar asked Sep 18 '17 21:09

Ztyx


People also ask

Does S3 copy overwrite?

If you send the file to the existing key, it will overwrite that file once the upload is complete.

How can you protect S3 data from being overwritten?

Using Amazon S3 Object Lock, you can prevent an object from being deleted or overwritten for a fixed amount of time, or until the legal hold is removed. An object version can have either a combination or both a retention period and a legal hold.

Can I read S3 file without downloading?

Reading objects without downloading them Similarly, if you want to upload and read small pieces of textual data such as quotes, tweets, or news articles, you can do that using the S3 resource method put(), as demonstrated in the example below (Gist).

How do I download content from S3 bucket?

To download an entire bucket to your local file system, use the AWS CLI sync command, passing it the s3 bucket as a source and a directory on your file system as a destination, e.g. aws s3 sync s3://YOUR_BUCKET . . The sync command recursively copies the contents of the source to the destination.


2 Answers

You can use AWS CLI's s3 sync command.

Syncs directories and S3 prefixes. Recursively copies new and updated files from the source directory to the destination.

According to this forum thread, you can use sync to synchronize only one file:

aws s3 sync s3://bucket/path/ local/path/ --exclude "*" --include "File.txt"

It says: sync the given paths, exclude all files, but include "File.txt" - so it will sync only "File.txt" under those given paths.


Or with the Java SDK:

According to the javadoc, there is a getObjectMetadata method which will return information about an S3 object (file) without downloading it's contents.

The method returns an ObjectMetadata object which can give you some useful information:

  • getLastModified method:

Gets the value of the Last-Modified header, indicating the date and time at which Amazon S3 last recorded a modification to the associated object.

  • getContentMD5 method:

Gets the base64 encoded 128-bit MD5 digest of the associated object (content - not including headers) according to RFC 1864.

  • getETag method:

Gets the hex encoded 128-bit MD5 digest of the associated object according to RFC 1864.

like image 122
juzraai Avatar answered Dec 03 '22 20:12

juzraai


I have used below code to download S3 files which have timestamp greater than the local folder timestamp. First it's check if any of the files in S3 folder have timestamp greater than the local folder timestamp. If yes then download those files only.

    TransferManager transferManager = TransferManagerBuilder.standard().build();
    AmazonS3 amazonS3 = AmazonS3ClientBuilder.standard().build();
            Path location = Paths.get("/data/test/");
            FileTime lastModifiedTime = null;
            try {
                lastModifiedTime = Files.getLastModifiedTime(location, LinkOption.NOFOLLOW_LINKS);
            } catch (IOException e) {
                e.printStackTrace();
            }

Date lastUpdatedTime = new Date(lastModifiedTime.toMillis());        

    ObjectListing listing = amazonS3.listObjects("bucket", "test-folder");
            List<S3ObjectSummary> summaries = listing.getObjectSummaries();
            for (S3ObjectSummary os: summaries) {
                if(os.getLastModified().after(lastUpdatedTime)) {
                    try {
                        String fileName="/data/test/"+os.getKey();
                        Download multipleFileDownload = transferManager.download(bucket, os.getKey(), new File(fileName));                        
                        while (multipleFileDownload.isDone() == false) {
                            Thread.sleep(1000);
                        }
                    }catch(InterruptedException i){
                        LOG.error("Exception Occurred while downloading the file ",i);
                    }
                }
            }
like image 43
dassum Avatar answered Dec 03 '22 20:12

dassum