I followed this page and ran the SparkPi example application on YARN in yarn-cluster mode.
http://spark.apache.org/docs/latest/running-on-yarn.html
I don't see the output of the program at the end (which is the result of the computation in this case). When I run it in the yarn-client mode (--master yarn-client), I see an output like this:
Pi is roughly 3.138796
Where does the standard out go in the yarn-cluster mode?
In cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
To launch spark application in cluster mode, we have to use spark-submit command. We cannot run yarn-cluster mode via spark-shell because when we run spark application, driver program will be running as part application master container/process. So it is not possible to run cluster mode via spark-shell.
No, it is not necessary to install Spark on all the 3 nodes. Since spark runs on top of Yarn, it utilizes yarn for the execution of its commands over the cluster's nodes.
After much poking around, I found this in the spark-0.9.0 doc.
Examine the output (replace $YARN_APP_ID in the following with the "application identifier" output by the previous command) (Note: YARN_APP_LOGS_DIR is usually /tmp/logs or $HADOOP_HOME/logs/userlogs depending on the Hadoop version.)
$ cat $YARN_APP_LOGS_DIR/$YARN_APP_ID/container*_000001/stdout
Pi is roughly 3.13794
I wish they put this instruction in the 1.1.0 documentation too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With