Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

the difference between "one Executor per Core vs one Executor with multiple Core"

we know that each task is performed in one core at the time. lets say we have cluster of node with this configuration :

10 node. 16 core per node. 64 gb Ram per node.

my question is what is the difference between to have 1 executor with 16 core and 16 executor with 1 CORE ???

i mean :enter image description here VS enter image description here

i get inspired from this source: https://spoddutur.github.io/spark-notes/distribution_of_executors_cores_and_memory_for_spark_application.html?fbclid=IwAR3xiFLBXBkwX2SrcJFZU0tfHU7Gssp-NJstfLDSRSRZzJgK6ybvJjSVcpY

Thank in advance

like image 438
Amir Boutaghou Avatar asked Dec 05 '22 09:12

Amir Boutaghou


2 Answers

dassum's answer is absolutely correct, but to dig deeper into this:

  • Tasks have lower overhead than running whole executor
  • Data exchange is faster within the same process than between different processes
  • Broadcasts (e.g. when you join a very small DataFrame to a huge multi-paritioned one) send copies of data to each of the executors, so the more executors, the more copies of this data have to be done

So by running 16 executors with one core each, you may see a performance drop when compared to 1 executor with 16 cores.

However!

  • JVM doesn't work well with > 200GB of memory - Cloudera documentation recommendeds 64GB as a memory limit for single executor to limit garbage collection issues
  • as mentioned in the article you linked, HDFS pretty much hits a throughput limit on 5 cores, so if you are running your cluster with YARN, that's the sensible limit

The cluster configuration matters in this case very much. On top of that partitioning of your data is another crucial factor. In the end if you have fewer partitions than available threads you won't be utilizing all of your cluster since each partition can be processed in only one thread. Another interesting case is when you have one more partition than number of cores - which is going to double your processing time assuming roughly equal size of partitions.

like image 130
Daniel Avatar answered Jan 03 '23 04:01

Daniel


1 executor with 16 core means you will have 1 JVM which can run maximum of 16 tasks

16 executor with 1 CORE means you will have 16 JVM and each JVM can run one task.

like image 20
dassum Avatar answered Jan 03 '23 03:01

dassum