Saving dataframe to local file system results in empty results

Tags:

1 Answers

That is not a bug and it is the expected behavior. Spark does not really support writes to non-distributed storage (it will work in local mode, just because you have shared file system).

Local path is not interpreted (only) as a path on the driver (this would require collecting the data) but local path on each executor. Therefore each executor will write its own chunk to its own local file system.

Not only output is no readable back (to load data each executor and the driver should see the same state of the file system), but depending on the commit algorithm, might not be even finalized (move from the temporary directory).

186

answered Oct 01 '22 20:10

zero323

Related questions
                            
                                Spark off heap memory leak on Yarn with Kafka direct stream
                            
                                Slow Performance with Apache Spark Gradient Boosted Tree training runs
                            
                                Why does Spark task take a long time to find block locally?
                            
                                How to evaluate a classifier with PySpark 2.4.5
                            
                                How to set preferences for ALS implicit feedback in Collaborative Filtering?
                            
                                Spark execution memory monitoring [closed]
                            
                                Writing more than 50 millions from Pyspark df to PostgresSQL, best efficient approach
                            
                                Spark: Writing to Avro file
                            
                                Apache Spark: pyspark crash for large dataset
                            
                                Understanding Spark's closures and their serialization
                            
                                apache spark MLLib: how to build labeled points for string features?
                            
                                How to suppress parquet log messages in Spark?
                            
                                Apache spark: setting spark.eventLog.enabled and spark.eventLog.dir at submit or Spark start
                            
                                How to create Spark RDD from an iterator?
                            
                                How does Apache Spark know about HDFS data nodes?
                            
                                Apache Spark throws NullPointerException when encountering missing feature
                            
                                In Spark, what is the right way to have a static object on all workers?
                            
                                Spark DataFrame Schema Nullable Fields
                            
                                Coalesce reduces parallelism of entire stage (spark)
                            
                                How to use java.time.LocalDate in Datasets (fails with java.lang.UnsupportedOperationException: No Encoder found)? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Saving dataframe to local file system results in empty results

Tags:

apache-spark

amazon-emr

WestCoastProjects

People also ask

1 Answers

zero323

Recent Activity

Donate For Us