Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark error: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

I have a virtual machine in which a spark-2.0.0-bin-hadoop2.7 in standalone mode is installed.

I ran ./sbin/start-all.sh to run the master and the slave.

When I do ./bin/spark-shell --master spark://192.168.43.27:7077 --driver-memory 600m --executor-memory 600m --executor-cores 1 in the machine itself the task's status is RUNNING and I am able to compute code in spark shell.

ran spark shell in virtual machine

When I do exactly the same command from another machine in the network, the status is "RUNNING" again, but the spark-shell throws WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources. I guess the problem is not directly related to resources because the same command works in the virtual machine itself, but not when it comes from other machines.

ran spark shell through another machine in the network

I checked most of the topics related to this error and none of them solved my problem. I even disabled firewall with sudo ufw disable just to make sure but no success (based on this link) which suggests:

Disable Firewall on the client : This was the solution that worked for me. Since I was working on a prototype in-house code, I disabled the firewall on the client node. For some reason the worker nodes, were not able to talk back to the client for me. For production purposes, you would want to open-up certain number of ports required.

like image 977
Arsinux Avatar asked Oct 11 '16 10:10

Arsinux


1 Answers

There are two known reasons for this:

  1. Your application requires more resources (cores, memory) than allocated. Increasing worker cores and memory should solve it. Most other answers focus on this.

  2. Where less known, the firewall is blocking the communication between master and workers. This could happen especially you are using cloud service. According to Spark Security, besides the standard 8080, 8081, 7077, 4040 ports, you also need to make sure the master and worker can communicate via the SPARK_WORKER_PORT, spark.driver.port and spark.blockManager.port; the latter three are used by submitting jobs and are randomly assigned by the program (if left unconfigured). You may try to open all ports to run a quick test.

like image 66
Fontaine007 Avatar answered Nov 15 '22 01:11

Fontaine007