Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to download S3 file in Serverless Lambda (Python)

I created a lambda in Python (using Serverless), which will be triggered by a SQS message.

handler.py

s3 = boto3.resource('s3')

def process(event, context):
    response = None
    # for record in event['Records']:
    record = event['Records'][0]
    message = dict()
    try:
        message = json.loads(record['body'])

        s3.meta.client.download_file(const.bucket_name, 'class/raw/photo/' + message['photo_name'], const.raw_filepath + message['photo_name'])    

        ...

        response = {
            "statusCode": 200,
            "body": json.dumps(event)
        }

    except Exception as ex:
        error_msg = 'JOB_MSG: {}, EXCEPTION: {}'.format(message, ex)
        logging.error(error_msg)

        response = {
                "statusCode": 500,
                "body": json.dumps(ex)
            }

    return response

const.py

bucket_name = 'test'
raw_filepath = '/var/task/raw/'

I created a folder "raw", same level with the file handler.py then deploy the serverless lambda.

I got an error (from CloudWatch) when lambda is triggered.

No such file or directory: u'/var/task/raw/Student001.JPG.94BBBAce'

As I understand, the lambda folder is not accessible or folder cannot be created in lambda.

Just in case of best practices, I share the objectives of lambda:

  • download S3 raw file
  • resize file and upload new file to another S3 bucket

Any suggestion is appreciated.

like image 242
Phong Vu Avatar asked Jan 21 '19 13:01

Phong Vu


2 Answers

If you need to download the object to the disk, you can use tempfile and download_fileobj to save it:

import tempfile

with tempfile.TemporaryFile() as f:
    s3.meta.client.download_fileobj(const.bucket_name, 
                                   'class/raw/photo/' + message['photo_name'],
                                    f)
    f.seek(0)
    # continue processing f

Note that there's a 512 MB limit on the size of temporary files in Lambda.

I would argue an even better way is to process it all in memory. Instead of tempfile, you can use io in a very similar fashion:

import io

data_stream = io.BytesIO()
s3.meta.client.download_fileobj(const.bucket_name, 
                               'class/raw/photo/' + message['photo_name'],
                                data_stream)
data_stream.seek(0)

This way, the data does not need to be written to a disk, which is a) faster and b) you can process bigger files, basically until you reach Lambda's memory limit of 3008 MB or memory.

like image 97
Milan Cermak Avatar answered Oct 22 '22 00:10

Milan Cermak


In one of my project I converted webp files to jpg. I can refer to the following github link to get some understanding:

https://github.com/adjr2/webp-to-jpg/blob/master/codes.py

You can directly access the file you download in lambda function. I am not sure whether you can create a new folder or not (even I am pretty new to all this stuff) but surely you can manipulate the file and upload back to the same (or different) s3 bucket.

Hope it helps. Cheers!

like image 1
adjr2 Avatar answered Oct 22 '22 01:10

adjr2