Download file from s3 Bucket to users computer.
I am working on a Python/Flask API for a React app. When the user clicks the Download button on the Front-End, I want to download the appropriate file to their machine.
import boto3
s3 = boto3.resource('s3')
s3.Bucket('mybucket').download_file('hello.txt', '/tmp/hello.txt')
I am currently using some code that finds the path of the downloads folder and then plugging that path into download_file() as the second parameter, along with the file on the bucket that they are trying to download.
This worked locally, and tests ran fine, but I run into a problem once it is deployed. The code will find the downloads path of the SERVER, and download the file there.
What is the best way to approach this? I have researched and cannot find a good solution for being able to download a file from the s3 bucket to the users downloads folder. Any help/advice is greatly appreciated.
You can use cp to copy the files from an s3 bucket to your local system. Use the following command: $ aws s3 cp s3://bucket/folder/file.txt .
To download an entire bucket to your local file system, use the AWS CLI sync command, passing it the s3 bucket as a source and a directory on your file system as a destination, e.g. aws s3 sync s3://YOUR_BUCKET . . The sync command recursively copies the contents of the source to the destination.
You can download a folder from S3 in two ways. One is from the Web UI. And another one is with the CLI command.
aws s3 sync s3://mybucket . will download all the objects in mybucket to the current directory. This will download all of your files using a one-way sync. It will not delete any existing files in your current directory unless you specify --delete , and it won't change or delete any files on S3.
You should not need to save the file to the server. You can just download the file into memory, and then build a Response
object containing the file.
from flask import Flask, Response
from boto3 import client
app = Flask(__name__)
def get_client():
return client(
's3',
'us-east-1',
aws_access_key_id='id',
aws_secret_access_key='key'
)
@app.route('/blah', methods=['GET'])
def index():
s3 = get_client()
file = s3.get_object(Bucket='blah-test1', Key='blah.txt')
return Response(
file['Body'].read(),
mimetype='text/plain',
headers={"Content-Disposition": "attachment;filename=test.txt"}
)
app.run(debug=True, port=8800)
This is ok for small files, there won't be any meaningful wait time for the user. However with larger files, this well affect UX. The file will need to be completely downloaded to the server, then download to the user. So to fix this issue, use the Range
keyword argument of the get_object
method:
from flask import Flask, Response
from boto3 import client
app = Flask(__name__)
def get_client():
return client(
's3',
'us-east-1',
aws_access_key_id='id',
aws_secret_access_key='key'
)
def get_total_bytes(s3):
result = s3.list_objects(Bucket='blah-test1')
for item in result['Contents']:
if item['Key'] == 'blah.txt':
return item['Size']
def get_object(s3, total_bytes):
if total_bytes > 1000000:
return get_object_range(s3, total_bytes)
return s3.get_object(Bucket='blah-test1', Key='blah.txt')['Body'].read()
def get_object_range(s3, total_bytes):
offset = 0
while total_bytes > 0:
end = offset + 999999 if total_bytes > 1000000 else ""
total_bytes -= 1000000
byte_range = 'bytes={offset}-{end}'.format(offset=offset, end=end)
offset = end + 1 if not isinstance(end, str) else None
yield s3.get_object(Bucket='blah-test1', Key='blah.txt', Range=byte_range)['Body'].read()
@app.route('/blah', methods=['GET'])
def index():
s3 = get_client()
total_bytes = get_total_bytes(s3)
return Response(
get_object(s3, total_bytes),
mimetype='text/plain',
headers={"Content-Disposition": "attachment;filename=test.txt"}
)
app.run(debug=True, port=8800)
This will download the file in 1MB chunks and send them to the user as they are downloaded. Both of these have been tested with a 40MB .txt
file.
A better way to solve this problem is to create presigned url. This gives you a temporary URL that's valid up to a certain amount of time. It also removes your flask server as a proxy between the AWS s3 bucket which reduces download time for the user.
def get_attachment_url():
bucket = 'BUCKET_NAME'
key = 'FILE_KEY'
client: boto3.s3 = boto3.client(
's3',
aws_access_key_id=YOUR_AWS_ACCESS_KEY,
aws_secret_access_key=YOUR_AWS_SECRET_KEY
)
return client.generate_presigned_url('get_object',
Params={'Bucket': bucket, 'Key': key},
ExpiresIn=60) `
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With