Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark configuration, what is the difference of SPARK_DRIVER_MEMORY, SPARK_EXECUTOR_MEMORY, and SPARK_WORKER_MEMORY?

I did my work, read the documentation at https://spark.apache.org/docs/latest/configuration.html

in spark-folder/conf/spark-env.sh:

  • SPARK_DRIVER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
  • SPARK_EXECUTOR_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
  • SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)

what is the relationship of above 3 parameters?

As I understand, DRIVER_MEMORY is the max memory master node/process can request. But for driver, how about multiple machine situation, eg. 1 master machine and 2 worker machine, worker machine should also have some memory available for spark driver?

EXECUTOR_MEMORY and WORKER_MEMORY are the same to me, just different names, could this also be explained please?

Thank you very much.

like image 544
keypoint Avatar asked Apr 29 '15 21:04

keypoint


1 Answers

First, you should know that 1 Worker (you can say 1 machine or 1 Worker Node) can launch multiple Executors (or multiple Worker Instances - the term they use in the docs).

  • SPARK_WORKER_MEMORY is only used in standalone deploy mode
  • SPARK_EXECUTOR_MEMORY is used in YARN deploy mode

In Standalone mode, you set SPARK_WORKER_MEMORY to the total amount of memory can be used on one machine (All Executors on this machine) to run your spark applications.

In contrast, In YARN mode, you set SPARK_DRIVER_MEMORY to the memory of one Executor

  • SPARK_DRIVER_MEMORY is used in YARN deploy mode, specifying the memory for the Driver that runs your application & communicates with Cluster Manager.
like image 83
ngtrkhoa Avatar answered Oct 05 '22 23:10

ngtrkhoa