I have a range of json files stored in an S3 bucket on AWS. I wish to use AWS lambda python service to parse this json and send the parsed results to an AWS RDS MySQL database. I have a stable python script for doing the parsing and writing to the database. I need to lambda script to iterate through the json files (when they are added). Each json file contains a list, simple consisting of <code>results = [content]</code> In pseudo-code what I want is: <ol> <li>Connect to the S3 bucket (<code>jsondata</code>)</li> <li>Read the contents of the JSON file (<code>results</code>)</li> <li>Execute my script for this data (<code>results</code>)</li> </ol> I can list the buckets I have by: <pre class="prettyprint"><code>import boto3 s3 = boto3.resource('s3') for bucket in s3.buckets.all(): print(bucket.name) </code></pre> Giving: <pre class="prettyprint"><code>jsondata </code></pre> But I cannot access this bucket to read its results. There doesn't appear to be a <code>read</code> or <code>load</code> function. I wish for something like <pre class="prettyprint"><code>for bucket in s3.buckets.all(): print(bucket.contents) </code></pre> EDIT I am misunderstanding something. Rather than reading the file in S3, lambda must download it itself. From here it seems that you must give lambda a download path, from which it can access the files itself <pre class="prettyprint"><code>import libraries s3_client = boto3.client('s3') def function to be executed: blah blah def handler(event, context): for record in event['Records']: bucket = record['s3']['bucket']['name'] key = record['s3']['object']['key'] download_path = '/tmp/{}{}'.format(uuid.uuid4(), key) s3_client.download_file(bucket, key, download_path) </code></pre>

You can use <code>bucket.objects.all()</code> to get a list of the all objects in the bucket (you also have alternative methods like <code>filter</code>, <code>page_size</code>and <code>limit</code> depending on your need) These methods return an iterator with <code>S3.ObjectSummary</code> objects in it, from there you can use the method <code>object.get</code> to retrieve the file.

Reading data from S3 using Lambda

Tags:

python

json

amazon-web-services

amazon-s3

aws-lambda

I have a range of json files stored in an S3 bucket on AWS.

I wish to use AWS lambda python service to parse this json and send the parsed results to an AWS RDS MySQL database.

I have a stable python script for doing the parsing and writing to the database. I need to lambda script to iterate through the json files (when they are added).

Each json file contains a list, simple consisting of results = [content]

In pseudo-code what I want is:

Connect to the S3 bucket (jsondata)
Read the contents of the JSON file (results)
Execute my script for this data (results)

I can list the buckets I have by:

Click to copy

import boto3

s3 = boto3.resource('s3')

for bucket in s3.buckets.all():
    print(bucket.name)

Giving:

Click to copy

jsondata

But I cannot access this bucket to read its results.

There doesn't appear to be a read or load function.

I wish for something like

Click to copy

for bucket in s3.buckets.all():
   print(bucket.contents)

EDIT

I am misunderstanding something. Rather than reading the file in S3, lambda must download it itself.

From here it seems that you must give lambda a download path, from which it can access the files itself

Click to copy

import libraries

s3_client = boto3.client('s3')

def function to be executed:
   blah blah

def handler(event, context):
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key'] 
        download_path = '/tmp/{}{}'.format(uuid.uuid4(), key)
        s3_client.download_file(bucket, key, download_path)

367

asked Nov 18 '15 14:11

LearningSlowly

2 Answers

Click to copy

s3 = boto3.client('s3')
response = s3.get_object(Bucket=bucket, Key=key)
emailcontent = response['Body'].read().decode('utf-8')

146

answered Oct 09 '22 11:10

James Hogbin

You can use bucket.objects.all() to get a list of the all objects in the bucket (you also have alternative methods like filter, page_sizeand limit depending on your need)

These methods return an iterator with S3.ObjectSummary objects in it, from there you can use the method object.get to retrieve the file.

answered Oct 09 '22 10:10

Dysosmus

Related questions
                            
                                How to scrape a website that requires login first with Python
                            
                                Pandas and Matplotlib - fill_between() vs datetime64
                            
                                Python p-value from t-statistic
                            
                                Scikit-learn, get accuracy scores for each class
                            
                                Find longest repetitive sequence in a string
                            
                                Docstrings when nothing is returned
                            
                                TensorFlow: How and why to use SavedModel
                            
                                Reading serial data in realtime in Python
                            
                                Python library for playing fixed-frequency sound
                            
                                Format truncated Python float as int in string
                            
                                Scikit Learn TfidfVectorizer : How to get top n terms with highest tf-idf score
                            
                                Python Dictionary contains List as Value - How to update?
                            
                                500 Error without anything in the apache logs
                            
                                Python Threading with Event object
                            
                                What's the difference between io.open() and os.open() on Python?
                            
                                What's the difference between nan, NaN and NAN
                            
                                How to read an image in Python OpenCV
                            
                                Data type conversion error: ValueError: Cannot convert non-finite values (NA or inf) to integer [duplicate]
                            
                                What is a python thread
                            
                                Change timezone of date-time column in pandas and add as hierarchical index

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reading data from S3 using Lambda

Tags:

python

json

amazon-web-services

amazon-s3

aws-lambda

LearningSlowly

People also ask

2 Answers

James Hogbin

Dysosmus

Recent Activity

Donate For Us