Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

run Spark-Submit on YARN but Imbalance (only 1 node is working)

i try to run Spark Apps on YARN-CLUSTER (2 Nodes) but it seems those 2 nodes are imbalance because only 1 node is working but another one is not.

My Script :

spark-submit --class org.apache.spark.examples.SparkPi 
--master yarn-cluster --deploy-mode cluster --num-executors 2 
--driver-memory 1G 
--executor-memory 1G 
--executor-cores 2 spark-examples-1.6.1-hadoop2.6.0.jar 1000

I see one of my node is working but another is not, so this is imbalance :

enter image description here Note : in the left is namenode, and datanode is on the right...

Any Idea ?

like image 615
zukijuki Avatar asked Aug 18 '16 10:08

zukijuki


People also ask

Do we need to install Spark on all nodes of YARN?

No, it is not necessary to install Spark on all the 3 nodes. Since spark runs on top of Yarn, it utilizes yarn for the execution of its commands over the cluster's nodes. So, you just have to install Spark on one node.

What is the difference between running running Spark submit in YARN client mode vs YARN cluster mode?

In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.

What are the two ways to run Spark on YARN?

Spark supports two modes for running on YARN, “yarn-cluster” mode and “yarn-client” mode.

What happens when we run Spark submit?

Once you do a Spark submit, a driver program is launched and this requests for resources to the cluster manager and at the same time the main program of the user function of the user processing program is initiated by the driver program.


1 Answers

The complete dataset could be local to one of the nodes, hence it might be trying to honour data locality. You can try the following config while launching spark-submit

--conf "spark.locality.wait.node=0"

The same worked for me.

like image 83
Harshit Avatar answered Oct 04 '22 20:10

Harshit