The application includes
val stats = sqlContext.sql("select id, n from myTable")
stats.write.parquet("myTable.parquet")
This creates dir myTable.parquet
with no contents other than an empty _SUCCESS
file, even that
stats.show // illustration only here, original size motivates parquet use
+-----+----+
| id | n |
+-----+----+
| a | 1 |
| b | 2 |
+-----+----+
stats.printSchema
root
|-- id: string (nullable = true)
|-- n: long (nullable = true)
How to make write.parquet
to write the actual contents of the dataframe ? What is missing ?
Note This occurs also with saveAsTextFile
.
In my case, this was happening when I was trying to save a file to my local filesystem instead of the file system that is accessible from the Spark cluster.
The file is written by the Spark worker nodes, not by the PySpark client, and so it should be output to a filesystem that is accessible both by the worker nodes and the client.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With