Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is driver memory and executor memory in spark? [duplicate]

I am new to spark framework and i would like to know what is driver memory and executor memory? what is the effective way to get the maximum performance from both of them?

like image 931
routh mahesh Avatar asked Jan 26 '23 13:01

routh mahesh


1 Answers

Spark need a driver to handle the executors. So the best way to understand is:

Driver

The one responsible to handle the main logic of your code, get resources with yarn, handle the allocation and handle some small amount of data for some type of logic. The Driver Memory is all related to how much data you will retrieve to the master to handle some logic. If you retrieve too much data with a rdd.collect() your driver will run out of memory. The memory for the driver usually is small 2Gb to 4Gb is more than enough if you don't send too much data to it.

Worker

Here is where the magic happens, the worker will be the one responsible to execute your job. The amount of memory depends of what you are going to do. If you just going to do a map function where you just going to transform the data with no type of aggregation, you usually don't need much memory. But if you are going to run big aggregations, a lot of steps and etc. Usually you will use a good amount of memory. And it is related to the size of your files that you will read.

Tell you a proper amount of memory for each case all depends of how your job will work. You need to understand what is the impact of each function and monitor to tune your memory usage for each job. Maybe 2Gb per worker is what you need, but sometimes 8Gb per workers is what you need.

like image 200
Thiago Baldim Avatar answered Mar 08 '23 08:03

Thiago Baldim