saving a dataframe to JSON file on local drive in pyspark

Tags:

I have a dataframe that I am trying to save as a JSON file using pyspark 1.4, but it doesn't seem to be working. When i give it the path to the directory it returns an error stating it already exists. My assumption based off the documentation was that it would save a json file in the path that you give it.

df.write.json("C:\Users\username")

Specifying a directory with a name doesn't produce any file and gives and error of "java.io.IOException: Mkdirs failed to create file:/C:Users/username/test/_temporary/....etc. It does however create a directory of the name test which contains several sub-directories with blank crc files.

df.write.json("C:\Users\username\test")

And adding a file extension of JSON, produces the same error

df.write.json("C:\Users\username\test.JSON")

336

asked Jun 26 '15 15:06

Jared

3 Answers

Could you not just use

df.toJSON()

as shown here? If not, then first transform into a pandas DataFrame and then write to json.

pandas_df = df.toPandas()
pandas_df.to_json("C:\Users\username\test.JSON")

answered Oct 15 '22 03:10

Wesley Bowman

When working with large data converting pyspark dataframe to pandas is not advisable. you can use below command to save json file in output directory. Here df is pyspark.sql.dataframe.DataFrame. Part file will be generated inside the output directory by the cluster.

df.coalesce(1).write.format('json').save('/your_path/output_directory')

answered Oct 15 '22 02:10

Shreyak

I would avoid using write.json since its causing problems on Windows. Using Python's file writing should skip creating the temp directories that are giving you issues.

with open("C:\\Users\\username\\test.json", "w+") as output_file:
    output_file.write(df.toJSON())

answered Oct 15 '22 04:10

Brobin

Related questions
                            
                                django 1.8 SESSION_EXPIRE_AT_BROWSER_CLOSE not working
                            
                                How to get accurate idft result from opencv?
                            
                                How to gracefully fallback to `NaN` value while reading integers from a CSV with Pandas?
                            
                                adding an image to the Turtle Screen
                            
                                How to check in python if some class (by string name) exists?
                            
                                Pandas Time Series Holiday Rule Offset
                            
                                Why is my Sieve of Eratosthenes so slow?
                            
                                Extract values from a list using an array with boolean expressions
                            
                                Is there any function like iconv in Python?
                            
                                Correct way of loading JSON from file into a Python dictionary
                            
                                Hausdorff distance between 3D grids
                            
                                Script with scipy using py2exe
                            
                                Python Scapy vs dpkt
                            
                                How to make a scrolling menu in python-curses
                            
                                How to add capital to django-cities-light country model?
                            
                                Using subprocess.check_output for a command with 2>/dev/null
                            
                                Pandas/Python Combine two data frames with duplicate rows
                            
                                How to solve import error for pandas using iPython Notebook on Windows?
                            
                                How can I evaluate a list of strings as a list of tuples in Python?
                            
                                Newick tree representation to scipy.cluster.hierarchy linkage matrix format

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

saving a dataframe to JSON file on local drive in pyspark

Tags:

python

json

apache-spark

pyspark