run Spark-Submit on YARN but Imbalance (only 1 node is working)

Tags:

i try to run Spark Apps on YARN-CLUSTER (2 Nodes) but it seems those 2 nodes are imbalance because only 1 node is working but another one is not.

My Script :

spark-submit --class org.apache.spark.examples.SparkPi 
--master yarn-cluster --deploy-mode cluster --num-executors 2 
--driver-memory 1G 
--executor-memory 1G 
--executor-cores 2 spark-examples-1.6.1-hadoop2.6.0.jar 1000

I see one of my node is working but another is not, so this is imbalance :

enter image description here Note : in the left is namenode, and datanode is on the right...

Any Idea ?

615

asked Aug 18 '16 10:08

zukijuki

1 Answers

The complete dataset could be local to one of the nodes, hence it might be trying to honour data locality. You can try the following config while launching spark-submit

--conf "spark.locality.wait.node=0"

The same worked for me.

answered Oct 04 '22 20:10

Harshit

Related questions
                            
                                Hadoop mapreduce streaming from HBase
                            
                                Repository organization for Hadoop project
                            
                                HDFS says file is still open, but process writing to it was killed
                            
                                MDX support for Hive (Hadoop)
                            
                                Convert DataInput to DataInputStream?
                            
                                cluster genetic programming/algorithms
                            
                                Efficiently Storing the data in Hive
                            
                                Writing to a file in HDFS in Hadoop
                            
                                Hadoop/YARN job FAILED - "exited with exitCode: -1000 due to: Could not find any valid local directory for nmPrivate..."
                            
                                Hiveserver2 cannot fetch result of a query from remote connection
                            
                                Hadoop 2.3.0 wordcount runs forever
                            
                                steps to replace a hadoop namenodes and journal nodes
                            
                                Workflow error logs disabled in Oozie 4.2
                            
                                Spark Swift Integration Parquet
                            
                                Integrating Spark SQL and Apache Drill through JDBC
                            
                                OpenCV library loaded in hadoop but not working
                            
                                Hadoop, MapReduce - Multiple Input/Output Paths
                            
                                Why does full outer join in HIVE gives weird result when one of the join fields is missing?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

run Spark-Submit on YARN but Imbalance (only 1 node is working)

Tags:

apache-spark

hadoop

hadoop-yarn

cluster-computing

zukijuki

People also ask

1 Answers

Harshit

Recent Activity

Donate For Us