Using spark-submit
command with --master yarn --deploy-mode cluster
causes larger scheduler delays rather than using --master yarn --deploy-mode client
.
Task performance results Screenshot:
This primarily concerns jobs with collect
operation called on RDDs.
The spark application started in client
mode takes approximately 3 - 4 minutes, on the contrary to cluster
mode with 6 - 7 minutes. The size of each task within stages is less than 100 KB. Cluster has 8 data nodes and runs Cloudera Manager 5.9.0
The solution for this particular case. The problem was caused by the broken ethernet cable in the cluster infrastructure. After replacing it the time has reduced greatly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With