Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyspark dataframe write to single json file with specific name

I have a dataframe which I want to write it as single json file with a specific name. I tried below

df2 = df1.select(df1.col1,df1.col2)
df2.write.format('json').save('/path/file_name.json') # didnt work, writing in folder 'file_name.json' and files with part-XXX
df2.toJSON().saveAsTextFile('/path/file_name.json')  # didnt work, writing in folder 'file_name.json' and files with part-XXX

Appreciate if some one can provide a solution.

like image 218
Lijju Mathew Avatar asked Apr 07 '17 03:04

Lijju Mathew


People also ask

How do you write data to a file in PySpark?

In Spark, you can save (write/extract) a DataFrame to a CSV file on disk by using dataframeObj. write. csv("path") , using this you can also write DataFrame to AWS S3, Azure Blob, HDFS, or any Spark supported file systems.

What does explode () do in a JSON field?

The explode function explodes the dataframe into multiple rows.

What is multiline JSON?

Spark JSON data source API provides the multiline option to read records from multiple lines. By default, spark considers every record in a JSON file as a fully qualified record in a single line hence, we need to use the multiline option to process JSON from multiple lines.


2 Answers

You need to save this on single file using below code:-

df2 = df1.select(df1.col1,df1.col2)
df2.coalesce(1).write.format('json').save('/path/file_name.json')

This will make a folder with file_name.json. Check this folder you can get a single file with whole data part-000

like image 170
Rakesh Kumar Avatar answered Oct 12 '22 07:10

Rakesh Kumar


You can do it by converting to a pandas df previously:

df.toPandas().to_json('path/file_name.json', orient='records', force_ascii=False, lines=True)
like image 41
fedosique Avatar answered Oct 12 '22 07:10

fedosique