I am writing an RDD to a file using below command: <pre class="prettyprint"><code>rdd.coalesce(1).saveAsTextFile(FilePath) </code></pre> When the FilePath is HDFS path (<code>hdfs://node:9000/folder/</code>) everything works fine. When the FilePath is local path (<code>file:///home/user/folder/</code>) everything seems to work. The output folder is created and <code>SUCCESS</code> file is also present. However I do not see any <code>part-00000</code> file containing the output. There is no other file. There is no error in the spark console output either. I also tried calling collect on the RDD before calling <code>saveAsTextFile()</code>, giving <code>777 permission</code> to output folder but nothing is working. Please help.

save to local make effects only when using <code>local</code> master

Spark: saveAsTextFile() only creating SUCCESS file and no part file when writing to local filesystem

Tags:

apache-spark

hadoop

I am writing an RDD to a file using below command:

rdd.coalesce(1).saveAsTextFile(FilePath)

When the FilePath is HDFS path (hdfs://node:9000/folder/) everything works fine.

When the FilePath is local path (file:///home/user/folder/) everything seems to work. The output folder is created and SUCCESS file is also present.

However I do not see any part-00000 file containing the output. There is no other file. There is no error in the spark console output either.

I also tried calling collect on the RDD before calling saveAsTextFile(), giving 777 permission to output folder but nothing is working.

Please help.

722

asked Jun 14 '17 05:06

Nikhil Utane

1 Answers

save to local make effects only when using local master

answered Oct 16 '22 15:10

dpwang

Related questions
                            
                                Storing query result in a variable
                            
                                How to fix the "Illegal partition" error in hadoop?
                            
                                Explanation of YARN's DRF
                            
                                Use SparkContext hadoop configuration within RDD methods/closures, like foreachPartition
                            
                                NoClassDefFoundError org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorManager
                            
                                Accessing files in hadoop distributed cache
                            
                                Hive Job failed with return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask and Query Performance
                            
                                Spark SQL unable to complete writing Parquet data with a large number of shards
                            
                                hadoop Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit
                            
                                Spark driver disassociated and removed by the master
                            
                                Using hive table over parquet in Pig
                            
                                TIMESTAMP format issue in HIVE

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With