I did my work, read the documentation at https://spark.apache.org/docs/latest/configuration.html
in spark-folder/conf/spark-env.sh:
what is the relationship of above 3 parameters?
As I understand, DRIVER_MEMORY is the max memory master node/process can request. But for driver, how about multiple machine situation, eg. 1 master machine and 2 worker machine, worker machine should also have some memory available for spark driver?
EXECUTOR_MEMORY and WORKER_MEMORY are the same to me, just different names, could this also be explained please?
Thank you very much.
First, you should know that 1 Worker (you can say 1 machine or 1 Worker Node) can launch multiple Executors (or multiple Worker Instances - the term they use in the docs).
SPARK_WORKER_MEMORY
is only used in standalone deploy modeSPARK_EXECUTOR_MEMORY
is used in YARN deploy modeIn Standalone mode, you set SPARK_WORKER_MEMORY
to the total amount of memory can be used on one machine (All Executors on this machine) to run your spark applications.
In contrast, In YARN mode, you set SPARK_DRIVER_MEMORY
to the memory of one Executor
SPARK_DRIVER_MEMORY
is used in YARN deploy mode, specifying the memory for the Driver that runs your application & communicates with Cluster Manager.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With