Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Spark on Yarn store shuffled files?

Tags:

apache-spark

I'm performing a filter in Spark using Yarn and receiving the below error. Any help is appreciated, but my main question is about why the file is not found.

/hdata/10/yarn/nm/usercache/spettinato/appcache/application_1428497227446_131967/spark-local-20150708124954-aa00/05/merged_shuffle_1_343_1

It appears that Spark can't find a file that has been stored to HDFS after being shuffled.

Why is Spark accessing directory "/hdata/"? This directory does not exist in HDFS, is it supposed to be a local directory or an HDFS directory?
Can I configure the location where shuffled data is stored?

15/07/08 12:57:03 WARN TaskSetManager: Loss was due to java.io.FileNotFoundException
java.io.FileNotFoundException: /hdata/10/yarn/nm/usercache/spettinato/appcache/application_1428497227446_131967/spark-local-20150708124954-aa00/05/merged_shuffle_1_343_1 (No such file or directory)
        at java.io.FileOutputStream.open(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
        at org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:116)
        at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:177)
        at org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:161)
        at org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:158)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
        at org.apache.spark.scheduler.Task.run(Task.scala:51)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

EDIT: I figured out some of this. The directory configured by spark.local.dir is the local directory used to store RDDs to disk as per http://spark.apache.org/docs/latest/configuration.html

like image 303
pettinato Avatar asked Jul 08 '15 20:07

pettinato


People also ask

How to enable external shuffle service for Spark on yarn?

I am using Spark on YARN and want to enable external shuffle service. 1. Add Spark Yarn Shuffle jar to hadoop classpath: Copy /opt/mapr/spark/spark-<version>/yarn/spark-<version>-<release>-yarn-shuffle.jar to /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib on every node.

How can I run Spark on yarn?

The job of Spark can run on YARN in two ways, those of which are cluster mode and client mode. Choosing apt memory location configuration is important in understanding the differences between the two modes.

Why is my spark local directory not working in yarn?

If the user specifies spark.local.dir, it will be ignored. In client mode, the Spark executors will use the local directories configured for YARN while the Spark driver will use those defined in spark.local.dir. This is because the Spark driver does not run on the YARN cluster in client mode, only the Spark executors do.

Why doesn't the spark driver run on the yarn cluster in client mode?

This is because the Spark driver does not run on the YARN cluster in client mode, only the Spark executors do. The --files and --archives options support specifying file names with the # similar to Hadoop.


2 Answers

I will suggest checking out the space left on your system. I'd say as Carlos that the task died, and that the reason is that spark could not write a shuffle file due to lack of space.

Try grepping java.io.IOException: No space left on device in the ./work directory of your workers.

like image 199
Bacon Avatar answered Sep 27 '22 16:09

Bacon


Most likely answer is that the task died. For example from OutOfMemory or other exception.

like image 38
Carlos Rendon Avatar answered Sep 27 '22 18:09

Carlos Rendon