Write pandas dataframe as compressed CSV directly to Amazon s3 bucket?

Tags:

I currently have a script that reads the existing version of a csv saved to s3, combines that with the new rows in the pandas dataframe, and then writes that directly back to s3.

    try:
        csv_prev_content = str(s3_resource.Object('bucket-name', ticker_csv_file_name).get()['Body'].read(), 'utf8')
    except:
        csv_prev_content = ''

    csv_output = csv_prev_content + curr_df.to_csv(path_or_buf=None, header=False)
    s3_resource.Object('bucket-name', ticker_csv_file_name).put(Body=csv_output)

Is there a way that I can do this but with a gzip compressed csv? I want to read an existing .gz compressed csv on s3 if there is one, concatenate it with the contents of the dataframe, and then overwrite the .gz with the new combined compressed csv directly in s3 without having to make a local copy.

648

asked May 02 '17 02:05

rosstripi

2 Answers

Here's a solution in Python 3.5.2 using Pandas 0.20.1.

The source DataFrame can be read from a S3, a local CSV, or whatever.

import boto3
import gzip
import pandas as pd
from io import BytesIO, TextIOWrapper

df = pd.read_csv('s3://ramey/test.csv')
gz_buffer = BytesIO()

with gzip.GzipFile(mode='w', fileobj=gz_buffer) as gz_file:
    df.to_csv(TextIOWrapper(gz_file, 'utf8'), index=False)

s3_resource = boto3.resource('s3')
s3_object = s3_resource.Object('ramey', 'new-file.csv.gz')
s3_object.put(Body=gz_buffer.getvalue())

answered Sep 22 '22 11:09

ramhiser

There is a more elegant solution using smart-open (https://pypi.org/project/smart-open/)

import pandas as pd
from smart_open import open

df.to_csv(open('s3://bucket/prefix/filename.csv.gz','w'),index = False)

answered Sep 24 '22 11:09

Alexander Lobkovsky Meitiv

Related questions
                            
                                Extract string inside nested brackets
                            
                                Does Seaborn come with Anaconda?
                            
                                Django's runscript: No (valid) module for script 'filename' found
                            
                                pandas read_csv and keep only certain rows (python)
                            
                                multiclass classification in xgboost (python)
                            
                                python: can't open file get-pip.py error 2] no such file or directory
                            
                                Python - While-Loop until list is empty
                            
                                Seaborn boxplot: TypeError: unsupported operand type(s) for /: 'str' and 'int'
                            
                                matplotlib adding blue shade to an image [duplicate]
                            
                                Tilde (~) isn't working in subprocess.Popen()
                            
                                How do I upload a CSV file in myBucket and Read File in S3 AWS using Python
                            
                                How to fill numpy array with another numpy array
                            
                                How do I fill a region with only hatch (no background colour) in matplotlib 2.0
                            
                                '::hypot' has not been declared
                            
                                How to Calculate R^2 in Tensorflow
                            
                                Python code to multiply two columns and then create new column with values
                            
                                How to resolve ImportError in Gurobi?
                            
                                Why can't I access .__mro__ attribute here?
                            
                                Why this error when I try to create workspaces in ROS?
                            
                                Remove Dollar Sign from Entire Python Pandas Dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Write pandas dataframe as compressed CSV directly to Amazon s3 bucket?

Tags:

python

pandas

csv

amazon-web-services

amazon-s3

rosstripi

People also ask

2 Answers

ramhiser

Alexander Lobkovsky Meitiv

Recent Activity

Donate For Us