Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyspark Save dataframe to S3

I want to save dataframe to s3 but when I save the file to s3 , it creates empty file with ${folder_name}, in which I want to save the file.

Syntax to save the dataframe :-

f.write.parquet("s3n://bucket-name/shri/test")

It saves the file in test folder but it creates $test under shri .

Is there a way I can save it without creating that extra folder?

like image 598
Shrikant Avatar asked Aug 24 '17 19:08

Shrikant


People also ask

How do I write PySpark DataFrame to S3 bucket?

Use the write() method of the Spark DataFrameWriter object to write Spark DataFrame to an Amazon S3 bucket in CSV file format.


1 Answers

I was able to do it by using below code.

df.write.parquet("s3a://bucket-name/shri/test.parquet",mode="overwrite")
like image 105
Usman Azhar Avatar answered Sep 19 '22 01:09

Usman Azhar