I have a pandas dataframe. i want to write this dataframe to parquet file in S3. I need a sample code for the same.I tried to google it. but i could not get a working sample code.
Amazon S3 inventory gives you a flat file list of your objects and metadata. You can get the S3 inventory for CSV, ORC or Parquet formats.
Pandas DataFrame: to_parquet() function The to_parquet() function is used to write a DataFrame to the binary parquet format. This function writes the dataframe as a parquet file. File path or Root Directory path. Will be used as Root Directory path while writing a partitioned dataset.
For your reference, I have the following code works.
s3_url = 's3://bucket/folder/bucket.parquet.gzip' df.to_parquet(s3_url, compression='gzip')
In order to use to_parquet
, you need pyarrow
or fastparquet
to be installed. Also, make sure you have correct information in your config
and credentials
files, located at .aws
folder.
Edit: Additionally, s3fs
is needed. see https://stackoverflow.com/a/54006942/1862909
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With