i try to run Spark Apps on YARN-CLUSTER (2 Nodes) but it seems those 2 nodes are imbalance because only 1 node is working but another one is not.
My Script :
spark-submit --class org.apache.spark.examples.SparkPi
--master yarn-cluster --deploy-mode cluster --num-executors 2
--driver-memory 1G
--executor-memory 1G
--executor-cores 2 spark-examples-1.6.1-hadoop2.6.0.jar 1000
I see one of my node is working but another is not, so this is imbalance :
Note : in the left is namenode
, and datanode
is on the right...
Any Idea ?
No, it is not necessary to install Spark on all the 3 nodes. Since spark runs on top of Yarn, it utilizes yarn for the execution of its commands over the cluster's nodes. So, you just have to install Spark on one node.
In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
Spark supports two modes for running on YARN, “yarn-cluster” mode and “yarn-client” mode.
Once you do a Spark submit, a driver program is launched and this requests for resources to the cluster manager and at the same time the main program of the user function of the user processing program is initiated by the driver program.
The complete dataset could be local to one of the nodes, hence it might be trying to honour data locality. You can try the following config while launching spark-submit
--conf "spark.locality.wait.node=0"
The same worked for me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With