Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting up dynamic allocation in Apache Spark?

I am following the instruction here for setting up dynamic allocation for YARN resource manager.

However, I am confused by step 3: Add this jar to the classpath of all NodeManagers in your cluster.

Does this mean go to each node server and add the path to shuffle.jar to PATH environment variable? export=$PATH:<loc-to-shuffle.jar>?

like image 910
THIS USER NEEDS HELP Avatar asked Sep 22 '16 16:09

THIS USER NEEDS HELP


People also ask

What is dynamic allocation Spark?

Dynamic Resource Allocation. Spark provides a mechanism to dynamically adjust the resources your application occupies based on the workload. This means that your application may give resources back to the cluster if they are no longer used and request them again later when there is demand.

How do I allocate executors memory in Spark?

According to the recommendations which we discussed above: Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30. Leaving 1 executor for ApplicationManager => --num-executors = 29. Number of executors per node = 30/10 = 3. Memory per executor = 64GB/3 = 21GB.


1 Answers

Yarn classpath means that on all node managers, either set the yarn.application.classpath in yarn-site.xml which contains comma-separated list of CLASSPATH entries.

When this value is empty, the following default CLASSPATH for YARN applications would be used.

  • For Linux:
$HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/share/hadoop/common/*, $HADOOP_COMMON_HOME/share/hadoop/common/lib/*, $HADOOP_HDFS_HOME/share/hadoop/hdfs/*, $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*, $HADOOP_YARN_HOME/share/hadoop/yarn/*, $HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
  • For Windows:
%HADOOP_CONF_DIR%, %HADOOP_COMMON_HOME%/share/hadoop/common/*, %HADOOP_COMMON_HOME%/share/hadoop/common/lib/*, %HADOOP_HDFS_HOME%/share/hadoop/hdfs/*, %HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*, %HADOOP_YARN_HOME%/share/hadoop/yarn/*, %HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*

So put spark-<version>-yarn-shuffle.jar in one of the listed classpath directories defined in yarn.application.classpath or the default classpath directories.

You can also create the soft link of spark-<version>-yarn-shuffle.jar in one of the yarn classpath directories

Hope this helps...

like image 183
Anupam Jain Avatar answered Sep 22 '22 12:09

Anupam Jain