Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing pandas dataframe to S3 bucket (AWS)

I have an AWS Lambda function which queries API and creates a dataframe, I want to write this file to an S3 bucket, I am using:

import pandas as pd
import s3fs

df.to_csv('s3.console.aws.amazon.com/s3/buckets/info/test.csv', index=False)

I am getting an error:

No such file or directory: 's3.console.aws.amazon.com/s3/buckets/info/test.csv'

But that directory exists, because I am reading files from there. What is the problem here?

I've read the previous files like this:

s3_client = boto3.client('s3')
s3_client.download_file('info', 'secrets.json', '/tmp/secrets.json')

How can I upload the whole dataframe to an S3 bucket?

like image 255
Jonas Palačionis Avatar asked Apr 16 '20 15:04

Jonas Palačionis


People also ask

How do I write data into an S3 bucket?

To upload folders and files to an S3 bucketSign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/ . In the Buckets list, choose the name of the bucket that you want to upload your folders or files to. Choose Upload.

How do I convert a CSV file to an S3 bucket?

Navigate to All Settings > Raw Data Export > CSV Upload. Toggle the switch to ON. Select Amazon S3 Bucket from the dropdown menu. Enter your Access Key ID, Secret Access Key, and bucket name.


2 Answers

You can use boto3 package also for storing data to S3:

from io import StringIO  # python3 (or BytesIO for python2)
import boto3

bucket = 'info'  # already created on S3
csv_buffer = StringIO()
df.to_csv(csv_buffer)

s3_resource = boto3.resource('s3')
s3_resource.Object(bucket, 'df.csv').put(Body=csv_buffer.getvalue())
like image 178
wowkin2 Avatar answered Oct 28 '22 02:10

wowkin2


This

"s3.console.aws.amazon.com/s3/buckets/info/test.csv"

is not a S3 URI, you need to pass a S3 URI to save to s3. Moreover, you do not need to import s3fs (you only need it installed),

Just try:

import pandas as pd

df = pd.DataFrame()
# df.to_csv("s3://<bucket_name>/<obj_key>")

# In your case
df.to_csv("s3://info/test.csv")

NOTE: You need to create bucket on aws s3 first.

like image 41
null Avatar answered Oct 28 '22 01:10

null