Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Large scheduler delay in Apache Spark tasks using deploy mode cluster

Using spark-submit command with --master yarn --deploy-mode cluster causes larger scheduler delays rather than using --master yarn --deploy-mode client.

Task performance results Screenshot:

enter image description here

This primarily concerns jobs with collect operation called on RDDs.

The spark application started in client mode takes approximately 3 - 4 minutes, on the contrary to cluster mode with 6 - 7 minutes. The size of each task within stages is less than 100 KB. Cluster has 8 data nodes and runs Cloudera Manager 5.9.0

like image 459
Vadym VM Avatar asked Nov 28 '16 15:11

Vadym VM


1 Answers

The solution for this particular case. The problem was caused by the broken ethernet cable in the cluster infrastructure. After replacing it the time has reduced greatly.

like image 117
Vadym VM Avatar answered Sep 28 '22 08:09

Vadym VM