I have a pandas dataframe. i want to write this dataframe to parquet file in S3. I need a sample code for the same.I tried to google it. but i could not get a working sample code.

For your reference, I have the following code works. <pre class="prettyprint"><code>s3_url = 's3://bucket/folder/bucket.parquet.gzip' df.to_parquet(s3_url, compression='gzip') </code></pre> In order to use <code>to_parquet</code>, you need <code>pyarrow</code> or <code>fastparquet</code> to be installed. Also, make sure you have correct information in your <code>config</code> and <code>credentials</code> files, located at <code>.aws</code> folder. Edit: Additionally, <code>s3fs</code> is needed. see https://stackoverflow.com/a/54006942/1862909

How to write parquet file from pandas dataframe in S3 in python

1 Answers

For your reference, I have the following code works.

s3_url = 's3://bucket/folder/bucket.parquet.gzip' df.to_parquet(s3_url, compression='gzip')

In order to use to_parquet, you need pyarrow or fastparquet to be installed. Also, make sure you have correct information in your config and credentials files, located at .aws folder.

Edit: Additionally, s3fs is needed. see https://stackoverflow.com/a/54006942/1862909

answered Sep 30 '22 13:09

Wai Kiat

Related questions
                            
                                How to add a variable to Python plt.title?
                            
                                How can I more easily suppress previous exceptions when I raise my own exception in response?
                            
                                Using @functools.lru_cache with dictionary arguments
                            
                                pandas: GroupBy .pipe() vs .apply()
                            
                                Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://landinghub.visualstudio.com/visual-cpp-build-tools [duplicate]
                            
                                How can I extract the nth row of a pandas data frame as a pandas data frame?
                            
                                Error while downloading the requirements using pip install (setup command: use_2to3 is invalid.)
                            
                                Python 3 operator >> to print to file
                            
                                Python 3 Map function is not Calling up function
                            
                                Python 3: Is not JSON serializable
                            
                                Hide axis label only, not entire axis, in Pandas plot
                            
                                Python 3 Get HTTP page
                            
                                Type hint for NumPy ndarray dtype?
                            
                                Py2exe for Python 3.0
                            
                                HTTP requests.post timeout
                            
                                When I use Google Colaboratory, how to save image, weights in my Google Drive?
                            
                                Find all combinations of a list of numbers with a given sum
                            
                                Wait page to load before getting data with requests.get in python 3
                            
                                Argparse optional boolean [duplicate]
                            
                                How to run different python versions in cmd [duplicate]

How to write parquet file from pandas dataframe in S3 in python

Tags:

python-3.x

amazon-s3

parquet

Alexsander

People also ask

1 Answers

Wai Kiat

Recent Activity

Donate For Us