Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS Lambda and S3 and Pandas - Load CSV into S3, trigger Lambda, load into pandas, put back in bucket?

I'm a noob to AWS and lambda, so I apologize if this is a dumb question. What I would like to be able to do is load a spreadsheet into an s3 bucket, trigger lambda based on that upload, have lambda load the csv into pandas and do stuff with it, then write the dataframe back to a csv into a second s3 bucket.

I've read a lot about zipping a python script and all the libraries and dependencies and uploading that, and thats a separate question. I've also figured out how to trigger lambda upon uploading a file to an S3 bucket and to automatically copy that file to a second s3 bucket.

The part I'm having trouble finding any information on is that middle part, the loading the file into pandas and manipulating the file within pandas all inside the lambda function.

First question: Is something like that even possible? Second question: how do I "grab" the file from the s3 bucket and load it into pandas? would it be something like this?

import pandas as pd
import boto3
import json
s3 = boto3.resource('s3')

def handler(event, context):
     dest_bucket = s3.Bucket('my-destination-bucket')
     df = pd.read_csv(event['Records'][0]['s3']['object']['key'])
     # stuff to do with dataframe goes here

     s3.Object(dest_bucket.name, <code for file key>).copy_from(CopySource = df)

? I really have no idea if that's even close to right and is a complete shot in the dark. Any and all help would be really appreciated, because I'm pretty obviously out of my element!

like image 623
Tkelly Avatar asked Nov 07 '22 11:11

Tkelly


1 Answers

This code triggers a Lambda function on PUTS, then GETS it, then PUTS it into another bucket:

from __future__ import print_function
import os
import time
import json
import boto3

s3 = boto3.client('s3')

def lambda_handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = quote(event['Records'][0]['s3']['object']['key'].encode('utf8'))
    try:
        response = s3.get_object(Bucket=bucket, Key=key)
        s3_upload_article(response, bucket, end_path)
        return response['ContentType']
    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
        raise e

def s3_upload_article(html, bucket, end_path):
    s3.put_object(Body=html, Bucket=bucket, Key=end_path, ContentType='text/html', ACL='public-read')

I broke this code out from a more complicated Lambda script I have written, however, I hope it displays some of what you need to do. The PUTS of the object only triggers the scipt. Any other actions that occur after the event is triggered are up to you to code into the script.

bucket = event['Records'][0]['s3']['bucket']['name']
key = quote(event['Records'][0]['s3']['object']['key'].encode('utf8'))

Bucket and key in the first few lines are the bucket and key of the object that triggered the event. Everything else is up to you.

like image 80
Nicholas Martinez Avatar answered Nov 14 '22 23:11

Nicholas Martinez