For example, if number of executors are 40, but number of tasks are 80, that means each executor would be running two tasks in parallel. Also my functions (which tasks execute) are not normal functions, but I call programs inside them. So, each task actually takes several minutes to complete. So, my question is, how does Spark manages that? Would those tasks share the executor's JVM? What about number of cores, would it be divided among those two tasks? What if don't want those two tasks to run simultaneously, but execute them in round-robin fashion, that is, run the first task with all of the executor's cores, and only when its finished, run the second task?
it depends how you allocate your resources i.e. number of cores, cores per executors and allocated memory to the executors. it also depends on how you program to attain maximum parallelism.
it also depends how you code to attain the maximum parallelism. if there are two tasks and they are independent of each other they will run in parallel. if one task depend on the result of previous task they will execute serially.
yes number of cores can be divided into two tasks by making two executors and allocating the available cores to it.
for executing the tasks in round robin fashion you need to define the partitioning scheme and allocate the resources according to it. this will ensure that each task is executed after other task.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With