Lately I have found myself a bit confused between the different SPARK settings spark.executor.memory
, SPARK_WORKER_MEMORY
, SPARK_MEM
, SPARK_MASTER_MEMORY
, and the relationship to SPARK_WORKER_INSTANCES
and SPARK_WORKER_CORES
I found this post but it does not discuss SPARK_MASTER_MEMORY Spark Configuration: SPARK_MEM vs. SPARK_WORKER_MEMORY
According to the recommendations which we discussed above: Leave 1 core per node for Hadoop/Yarn daemons => Num cores available per node = 16-1 = 15. So, Total available of cores in cluster = 15 x 10 = 150. Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30.
Set the number of processors and amount of memory that a Spark cluster can use by setting the following environment variables in the spark-env.sh file: SPARK_WORKER_CORES. Sets the number of CPU cores that the Spark applications can use. The default is all cores on the host z/OS system.
The consensus in most Spark tuning guides is that 5 cores per executor is the optimum number of cores in terms of parallel processing.
To enlarge the Spark shuffle service memory size, modify SPARK_DAEMON_MEMORY in $SPARK_HOME/conf/spark-env.sh, the default value is 2g, and then restart shuffle to make the change take effect.
First of all just few words about terms. Spark master is application that coordinates resources allocation from slaves. Master does not perform any computations. Master is just resource manager.
Spark worker is application on worker node which coordinates resources on given worker node.
Spark executor is application created by spark worker which performs tasks on worker node for driver.
Check this doc for additional details - http://spark.apache.org/docs/latest/cluster-overview.html
spark.executor.memory
- is amount of memory for executor. This memory used for given user task.
SPARK_WORKER_MEMORY - how much system memory can be used by worker to creating executors on node. For example you have 64gb on node. You set SPARK_WORKER_MEMORY to 60gb. This means that you can create 2 x 30g executors or 10 x 6gb executors and so on.
SPARK_MEM AFAIK is not used anymore. I can not find it in current docs
SPARK_MASTER_MEMORY is memory for master. Should not be to high :)
SPARK_WORKER_CORES is total number of cores to be used by executors by each worker
SPARK_WORKER_INSTANCES is number of workers per worker node.
All these parameters are described here - http://spark.apache.org/docs/latest/spark-standalone.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With