Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why Spark is not distributing jobs to all executors, but to only one executer?

My Spark cluster has 1 master and 3 workers (on 4 separate machines, each machine with 1 core), and other settings are as in the picture below, where spark.cores.max is set to 3, and spark.executor.cores also 3 (in pic-1)

But when I submit my job to Spark cluster, from the Spark web-UI I can see only one executor is used (according to used memory and RDD blocks in pic-2), but not all of the executors. In this case the processing speed is much slower than I expected.

Since I've set the max cores to be 3, shouldn't all the executors be used to this job?

How to configurate Spark to distribute current job to all executors, instead of only one executor running current job?

Thanks a lot.

------------------pic-1: spark settings

------------------pic-2: enter image description here

like image 919
keypoint Avatar asked May 14 '15 20:05

keypoint


1 Answers

You said you are running two receivers, what kind of Receivers are they (Kafka, Hdfs, Twitter ??)

Which spark version are you using?

In my experience, if you are using any Receiver other than file receiver, then it will occupy 1 core permanently. So when you say you have 2 receivers, then 2 cores will be permanently used for receiving the data, so you are left with only 1 core which is doing the work.

Please post the Spark master hompage screenshot as well. And Job's Streaming page screenshot.

like image 199
Lokesh Kumar P Avatar answered Oct 02 '22 13:10

Lokesh Kumar P