Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Configuration: memory/instance/cores

Tags:

apache-spark

Lately I have found myself a bit confused between the different SPARK settings spark.executor.memory, SPARK_WORKER_MEMORY, SPARK_MEM, SPARK_MASTER_MEMORY, and the relationship to SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES

I found this post but it does not discuss SPARK_MASTER_MEMORY Spark Configuration: SPARK_MEM vs. SPARK_WORKER_MEMORY

like image 792
Oscar Avatar asked Oct 30 '14 04:10

Oscar


People also ask

How do you decide number of cores needed in Spark?

According to the recommendations which we discussed above: Leave 1 core per node for Hadoop/Yarn daemons => Num cores available per node = 16-1 = 15. So, Total available of cores in cluster = 15 x 10 = 150. Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30.

Can you configure CPU cores in Spark context?

Set the number of processors and amount of memory that a Spark cluster can use by setting the following environment variables in the spark-env.sh file: SPARK_WORKER_CORES. Sets the number of CPU cores that the Spark applications can use. The default is all cores on the host z/OS system.

What is number of cores in Spark?

The consensus in most Spark tuning guides is that 5 cores per executor is the optimum number of cores in terms of parallel processing.

How do I set Spark memory?

To enlarge the Spark shuffle service memory size, modify SPARK_DAEMON_MEMORY in $SPARK_HOME/conf/spark-env.sh, the default value is 2g, and then restart shuffle to make the change take effect.


1 Answers

First of all just few words about terms. Spark master is application that coordinates resources allocation from slaves. Master does not perform any computations. Master is just resource manager.

Spark worker is application on worker node which coordinates resources on given worker node.

Spark executor is application created by spark worker which performs tasks on worker node for driver.

Check this doc for additional details - http://spark.apache.org/docs/latest/cluster-overview.html

spark.executor.memory - is amount of memory for executor. This memory used for given user task.

SPARK_WORKER_MEMORY - how much system memory can be used by worker to creating executors on node. For example you have 64gb on node. You set SPARK_WORKER_MEMORY to 60gb. This means that you can create 2 x 30g executors or 10 x 6gb executors and so on.

SPARK_MEM AFAIK is not used anymore. I can not find it in current docs

SPARK_MASTER_MEMORY is memory for master. Should not be to high :)

SPARK_WORKER_CORES is total number of cores to be used by executors by each worker

SPARK_WORKER_INSTANCES is number of workers per worker node.

All these parameters are described here - http://spark.apache.org/docs/latest/spark-standalone.html

like image 152
1esha Avatar answered Oct 05 '22 13:10

1esha