the difference between "one Executor per Core vs one Executor with multiple Core"

Question

we know that each task is performed in one core at the time. lets say we have cluster of node with this configuration :

10 node. 16 core per node. 64 gb Ram per node.

my question is what is the difference between to have 1 executor with 16 core and 16 executor with 1 CORE ???

i mean : enter image description here VS

i get inspired from this source: https://spoddutur.github.io/spark-notes/distribution_of_executors_cores_and_memory_for_spark_application.html?fbclid=IwAR3xiFLBXBkwX2SrcJFZU0tfHU7Gssp-NJstfLDSRSRZzJgK6ybvJjSVcpY

Thank in advance

Daniel · Accepted Answer

dassum's answer is absolutely correct, but to dig deeper into this:

Tasks have lower overhead than running whole executor
Data exchange is faster within the same process than between different processes
Broadcasts (e.g. when you join a very small DataFrame to a huge multi-paritioned one) send copies of data to each of the executors, so the more executors, the more copies of this data have to be done

So by running 16 executors with one core each, you may see a performance drop when compared to 1 executor with 16 cores.

However!

JVM doesn't work well with > 200GB of memory - Cloudera documentation recommendeds 64GB as a memory limit for single executor to limit garbage collection issues
as mentioned in the article you linked, HDFS pretty much hits a throughput limit on 5 cores, so if you are running your cluster with YARN, that's the sensible limit

The cluster configuration matters in this case very much. On top of that partitioning of your data is another crucial factor. In the end if you have fewer partitions than available threads you won't be utilizing all of your cluster since each partition can be processed in only one thread. Another interesting case is when you have one more partition than number of cores - which is going to double your processing time assuming roughly equal size of partitions.

dassum · Answer

1 executor with 16 core means you will have 1 JVM which can run maximum of 16 tasks

16 executor with 1 CORE means you will have 16 JVM and each JVM can run one task.

the difference between "one Executor per Core vs one Executor with multiple Core"

Tags:

apache-spark

pyspark

Amir Boutaghou

2 Answers

Daniel

dassum

Recent Activity

Donate For Us

the difference between "one Executor per Core vs one Executor with multiple Core"

Tags:

apache-spark

pyspark

Amir Boutaghou

2 Answers

Daniel

dassum

Related questions

Recent Activity

Donate For Us