Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How a Spark executor runs multiple tasks?

For example, if number of executors are 40, but number of tasks are 80, that means each executor would be running two tasks in parallel. Also my functions (which tasks execute) are not normal functions, but I call programs inside them. So, each task actually takes several minutes to complete. So, my question is, how does Spark manages that? Would those tasks share the executor's JVM? What about number of cores, would it be divided among those two tasks? What if don't want those two tasks to run simultaneously, but execute them in round-robin fashion, that is, run the first task with all of the executor's cores, and only when its finished, run the second task?

like image 910
pythonic Avatar asked Oct 19 '16 00:10

pythonic


1 Answers

it depends how you allocate your resources i.e. number of cores, cores per executors and allocated memory to the executors. it also depends on how you program to attain maximum parallelism.

it also depends how you code to attain the maximum parallelism. if there are two tasks and they are independent of each other they will run in parallel. if one task depend on the result of previous task they will execute serially.

yes number of cores can be divided into two tasks by making two executors and allocating the available cores to it.

for executing the tasks in round robin fashion you need to define the partitioning scheme and allocate the resources according to it. this will ensure that each task is executed after other task.

like image 157
PradhanKamal Avatar answered Oct 05 '22 06:10

PradhanKamal