Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark write parquet not writing any files, only _SUCCESS

Tags:

The application includes

val stats = sqlContext.sql("select id, n from myTable")

stats.write.parquet("myTable.parquet")

This creates dir myTable.parquet with no contents other than an empty _SUCCESS file, even that

stats.show  // illustration only here, original size motivates parquet use

+-----+----+
|  id |  n |
+-----+----+
|   a |  1 |
|   b |  2 |
+-----+----+

stats.printSchema 

root
 |-- id: string (nullable = true)
 |-- n: long (nullable = true)

How to make write.parquet to write the actual contents of the dataframe ? What is missing ?

Note This occurs also with saveAsTextFile.

like image 904
echo Avatar asked Jun 06 '16 10:06

echo


1 Answers

In my case, this was happening when I was trying to save a file to my local filesystem instead of the file system that is accessible from the Spark cluster.

The file is written by the Spark worker nodes, not by the PySpark client, and so it should be output to a filesystem that is accessible both by the worker nodes and the client.

like image 150
ostrokach Avatar answered Oct 13 '22 01:10

ostrokach