Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Running Spark on YARN in yarn-cluster mode: Where does the console output go?

I followed this page and ran the SparkPi example application on YARN in yarn-cluster mode.


I don't see the output of the program at the end (which is the result of the computation in this case). When I run it in the yarn-client mode (--master yarn-client), I see an output like this:

Pi is roughly 3.138796

Where does the standard out go in the yarn-cluster mode?

like image 426
Muffintop Avatar asked Oct 12 '14 04:10


People also ask

What happens when a YARN cluster is started?

In cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.

Can we launch Spark Shell in cluster mode?

To launch spark application in cluster mode, we have to use spark-submit command. We cannot run yarn-cluster mode via spark-shell because when we run spark application, driver program will be running as part application master container/process. So it is not possible to run cluster mode via spark-shell.

Do you need to install Spark on all nodes of YARN cluster?

No, it is not necessary to install Spark on all the 3 nodes. Since spark runs on top of Yarn, it utilizes yarn for the execution of its commands over the cluster's nodes.

1 Answers

After much poking around, I found this in the spark-0.9.0 doc.

Examine the output (replace $YARN_APP_ID in the following with the "application identifier" output by the previous command) (Note: YARN_APP_LOGS_DIR is usually /tmp/logs or $HADOOP_HOME/logs/userlogs depending on the Hadoop version.)

$ cat $YARN_APP_LOGS_DIR/$YARN_APP_ID/container*_000001/stdout

Pi is roughly 3.13794

I wish they put this instruction in the 1.1.0 documentation too.

like image 95
Muffintop Avatar answered Sep 20 '22 04:09
