Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible for multiple Executors to be launched within a single Spark worker for one Spark Application?

Tags:

apache-spark

We are experiencing more cores running than anticipated by the SPARK_WORKER_CORES (set to one). As part of tracking this down let us consider a couple of the spark components and their descriptions in the docs:

  • Worker node Any node that can run application code in the cluster

  • Executor A process launched for an application on a worker node, that runs tasks and keeps data in memory or disk storage across them. Each application has its own executors.

So - for that last sentence: can there be multiple executors on a given worker node for a single application? Or only one?

like image 529
WestCoastProjects Avatar asked May 01 '15 22:05

WestCoastProjects


2 Answers

Yes, it is possible. In principle,you configure Spark to have certain number of executors and a certain number of cores per executor. The number of nodes translates to how Yarn or another cluster handles resources, but AFAIK Spark is pretty much agnostic to that.

If a node has enough memory and cores, it is very much possible that the cluster assigns two executors to the same node. At the end of the day, those are just resources to be handed out. You'll see that configuration examples from the docs show no concern for nodes:

$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn-cluster \
    --num-executors 3 \
    --driver-memory 4g \
    --executor-memory 2g \
    --executor-cores 1 \
    lib/spark-examples*.jar \
    10
like image 163
Daniel Langdon Avatar answered Nov 15 '22 08:11

Daniel Langdon


You first need to configure your spark standalone cluster, then set the amount of resources needed for each individual spark application you want to run.

In order to configure the cluster, you can try this:

In conf/spark-env.sh: Set the SPARK_WORKER_INSTANCES = 10 which determines the number of Worker instances (#Executors) per node (its default value is only 1) Set the SPARK_WORKER_CORES = 15# number of cores that one Worker can use (default: all cores, your case is 36) Set SPARK_WORKER_MEMORY = 55g # total amount of memory that can be used on one machine (Worker Node) for running Spark programs. Copy this configuration file to all Worker Nodes, on the same folder Start your cluster by running the scripts in sbin (sbin/start-all.sh, ...) As you have 5 workers, with the above configuration you should see 5 (workers) * 10 (executors per worker) = 50 alive executors on the master's web interface (http://localhost:8080 by default)

When you run an application in standalone mode, by default, it will acquire all available Executors in the cluster. You need to explicitly set the amount of resources for running this application: Eg:

val conf = new SparkConf() .setMaster(...) .setAppName(...) .set("spark.executor.memory", "2g") .set("spark.cores.max", "10")

like image 25
dilshad Avatar answered Nov 15 '22 10:11

dilshad