Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I download a file from S3 using boto only if the remote file is newer than a local copy?

I'm trying to download a file from S3 using boto, but only if a local copy of the file is older than the remote file.

I'm using the header 'If-Modified-Since' and the code below:

#!/usr/bin/python
import os
import datetime
import boto
from boto.s3.key import Key

bucket_name = 'my-bucket'

conn = boto.connect_s3()
bucket = conn.get_bucket(bucket_name)

def download(bucket, filename):
    key = Key(bucket, filename)
    headers = {}
    if os.path.isfile(filename):
        print "File exists, adding If-Modified-Since header"
        modified_since = os.path.getmtime(filename)
        timestamp = datetime.datetime.utcfromtimestamp(modified_since)
        headers['If-Modified-Since'] = timestamp.strftime("%a, %d %b %Y %H:%M:%S GMT")
    try:
        key.get_contents_to_filename(filename, headers)
    except boto.exception.S3ResponseError as e:
        return 304
    return 200

print download(bucket, 'README')

The problem is that when the local file does not exist everything works well and the file is downloaded. When I run the script for the second time my function returns 304 as expected, but the file that was previously downloaded is deleted.

like image 802
tuler Avatar asked Aug 09 '14 03:08

tuler


People also ask

How do I download from S3 bucket to local?

You can download an object from an S3 bucket in any of the following ways: Select the object and choose Download or choose Download as from the Actions menu if you want to download the object to a specific folder. If you want to download a specific version of the object, select the Show versions button.

How do I get files from AWS S3?

In the Amazon S3 console, choose your S3 bucket, choose the file that you want to open or download, choose Actions, and then choose Open or Download. If you are downloading an object, specify where you want to save it. The procedure for saving the object depends on the browser and operating system that you are using.

Can we read file from S3 without downloading?

Reading objects without downloading them Similarly, if you want to upload and read small pieces of textual data such as quotes, tweets, or news articles, you can do that using the S3 resource method put(), as demonstrated in the example below (Gist).


1 Answers

boto.s3.key.Key.get_contents_to_filename open file with wb mode; it truncate the file at the beginning of the function (boto/s3/key.py). In addition to that, it removes the file when an exception raised.

Instead of get_contents_to_filename, you can use get_contents_to_file with different open mode.

def download(bucket, filename):
    key = Key(bucket, filename)
    headers = {}
    mode = 'wb'
    updating = False
    if os.path.isfile(filename):
        mode = 'r+b'
        updating = True
        print "File exists, adding If-Modified-Since header"
        modified_since = os.path.getmtime(filename)
        timestamp = datetime.datetime.utcfromtimestamp(modified_since)
        headers['If-Modified-Since'] = timestamp.strftime("%a, %d %b %Y %H:%M:%S GMT")
    try:
        with open(filename, mode) as f:
            key.get_contents_to_file(f, headers)
            f.truncate()
    except boto.exception.S3ResponseError as e:
        if not updating:
            # got an error and we are not updating an existing file
            # delete the file that was created due to mode = 'wb'
            os.remove(filename)
        return e.status
    return 200

NOTE file.truncate is used to handle case where new file is smaller than previous one.

like image 106
falsetru Avatar answered Nov 15 '22 18:11

falsetru