Save Dataframe to csv directly to s3 Python

Tags:

I have a pandas DataFrame that I want to upload to a new CSV file. The problem is that I don't want to save the file locally before transferring it to s3. Is there any method like to_csv for writing the dataframe to s3 directly? I am using boto3.
Here is what I have so far:

import boto3
s3 = boto3.client('s3', aws_access_key_id='key', aws_secret_access_key='secret_key')
read_file = s3.get_object(Bucket, Key)
df = pd.read_csv(read_file['Body'])

# Make alterations to DataFrame

# Then export DataFrame to CSV through direct transfer to s3

434

asked Jul 01 '16 21:07

user2494275

3 Answers

You can use:

from io import StringIO # python3; python2: BytesIO 
import boto3

bucket = 'my_bucket_name' # already created on S3
csv_buffer = StringIO()
df.to_csv(csv_buffer)
s3_resource = boto3.resource('s3')
s3_resource.Object(bucket, 'df.csv').put(Body=csv_buffer.getvalue())

answered Oct 07 '22 02:10

Stefan

You can directly use the S3 path. I am using Pandas 0.24.1

In [1]: import pandas as pd

In [2]: df = pd.DataFrame( [ [1, 1, 1], [2, 2, 2] ], columns=['a', 'b', 'c'])

In [3]: df
Out[3]:
   a  b  c
0  1  1  1
1  2  2  2

In [4]: df.to_csv('s3://experimental/playground/temp_csv/dummy.csv', index=False)

In [5]: pd.__version__
Out[5]: '0.24.1'

In [6]: new_df = pd.read_csv('s3://experimental/playground/temp_csv/dummy.csv')

In [7]: new_df
Out[7]:
   a  b  c
0  1  1  1
1  2  2  2

Release Note:

S3 File Handling

pandas now uses s3fs for handling S3 connections. This shouldn’t break any code. However, since s3fs is not a required dependency, you will need to install it separately, like boto in prior versions of pandas. GH11915.

138

answered Oct 07 '22 02:10

yardstick17

I like s3fs which lets you use s3 (almost) like a local filesystem.

You can do this:

import s3fs

bytes_to_write = df.to_csv(None).encode()
fs = s3fs.S3FileSystem(key=key, secret=secret)
with fs.open('s3://bucket/path/to/file.csv', 'wb') as f:
    f.write(bytes_to_write)

s3fs supports only rb and wb modes of opening the file, that's why I did this bytes_to_write stuff.

answered Oct 07 '22 03:10

michcio1234

Related questions
                            
                                Disable a method in a ViewSet, django-rest-framework
                            
                                python design patterns [closed]
                            
                                How to mock an import
                            
                                Determine if 2 lists have the same elements, regardless of order? [duplicate]
                            
                                What is the difference between json.dump() and json.dumps() in python?
                            
                                Split views.py in several files
                            
                                TypeError: module.__init__() takes at most 2 arguments (3 given)
                            
                                Using javadoc for Python documentation [closed]
                            
                                Upgrade python packages from requirements.txt using pip command
                            
                                How to get the nth element of a python list or a default if not available
                            
                                ImportError: numpy.core.multiarray failed to import
                            
                                If list index exists, do X
                            
                                ImportError: No module named dateutil.parser
                            
                                From ND to 1D arrays
                            
                                Access nested dictionary items via a list of keys?
                            
                                How to crop an image using PIL?
                            
                                Python list directory, subdirectory, and files
                            
                                extract column value based on another column pandas dataframe
                            
                                How to len(generator()) [duplicate]
                            
                                No module named _sqlite3

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Save Dataframe to csv directly to s3 Python

Tags:

python

dataframe

csv

amazon-s3

boto3