Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between spark.task.cpus and --executor-cores

In my mapPartition part, there are multi-threading works to do, I use thread pool and want to run a task in parallel. But I cannot distinguish these two parameters. I guess I can set --executor-cores to 5, and I run 4 threads in my task. Is this right?

like image 399
cstur4 Avatar asked Dec 24 '22 05:12

cstur4


1 Answers

spark.task.cpus is the number of cores to allocate for each task and --executor-cores specify Number of cores per executor.

There is small difference between executor and tasks as explained here.

For knowing how many threads you can run per core go through this post.

As per the links :

When you create the SparkContext, each worker starts an executor. This is a separate process (JVM). The executors connect back to your driver program. Now the driver can send them commands, like flatMap, map and reduceByKey, these commands are tasks.

For knowing number of threads your cpu supports per core run lscpu and check value of Thread(s) per core:.

like image 172
Amit Kumar Avatar answered Apr 06 '23 01:04

Amit Kumar