I'm trying to setup a standalone Spark 2.0 server to process an analytics function in parallel. To do this I want to have a single worker with multiple executors.
I'm using :
This is just for pure proof of concept purposes but I want to have 8 executors, one per each core.
I've tried to follow the other threads on this topic but for some reason it's not working for me. IE: Spark Standalone Number Executors/Cores Control
My configuration is as follows:
conf\spark-defaults.conf
spark.cores.max = 8
spark.executor.cores = 1
I have tried to also change my spark-env.sh file to no avail. Instead what is happening is that it shows that my 1 worker only has 1 executor on it. As you can see below, it still shows the standalone with 1 executor with 8 cores to it.
It means each executor uses 5 cores. Each node has 3 executors therefore using 15 cores, except one of the nodes will also be running the application master for the job, so can only host 2 executors i.e. 10 cores in use as executors.
Yes, A worker node can be holding multiple executors (processes) if it has sufficient CPU, Memory and Storage.
SparkConf conf = new SparkConf() // 4 executor per instance of each worker . set("spark. executor. instances", "4") // 5 cores on each executor .
The --num-executors option to the Spark YARN client controls how many executors it will allocate on the cluster, while --executor-memory and --executor-cores control the resources per executor.
I believe you mixed up local and standalone modes:
local
, local[*]
or local[n]
. spark.executor.cores
and spark.executor.cores
are not applicable in the local mode because there is only one embedded executor.Standalone mode requires a standalone Spark cluster. It requires a master node (can be started using SPARK_HOME/sbin/start-master.sh
script) and at least one worker node (can be started using SPARK_HOME/sbin/start-slave.sh
script).
SparkConf
should use master node address to create (spark://host:port
).
You first need to configure your spark standalone cluster, then set the amount of resources needed for each individual spark application you want to run.
In order to configure the cluster, you can try this:
In conf/spark-env.sh:
Set the SPARK_WORKER_INSTANCES = 10
which determines the number of Worker instances (#Executors) per node (its default value is only 1)
Set the SPARK_WORKER_CORES = 15
number of cores that one Worker can use (default: all cores, your case is 36)
Set SPARK_WORKER_MEMORY = 55g
total amount of memory that can be used on one machine (Worker Node) for running Spark programs. Copy this configuration file to all Worker Nodes, on the same folder Start your cluster by running the scripts in sbin (sbin/start-all.sh, ...) As you have 5 workers, with the above configuration you should see 5 (workers) * 10 (executors per worker) = 50 alive executors on the master's web interface (http://localhost:8080 by default)
When you run an application in standalone mode, by default, it will acquire all available Executors in the cluster. You need to explicitly set the amount of resources for running this application: Eg:
val conf = new SparkConf()
.setMaster(...)
.setAppName(...)
.set("spark.executor.memory", "2g")
.set("spark.cores.max", "10")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With