I've been setting up a Spark standalone cluster setup following this link. I have 2 machines; The first one (ubuntu0) serve as both the master and a worker, and the second one (ubuntu1) is just a worker. Password-less ssh has been properly configured for both machines already and was tested by doing SSH manually on both sides.
Now when I tried to ./start-all.ssh, both master and worker on the master machine (ubuntu0) were started properly. This is signified by (1)WebUI being accessible (localhost:8081 on my part) and (2) Worker registered/displayed on the WebUI. However, the other worker on the second machine (ubuntu1), was not started. The error displayed was:
ubuntu1: ssh: connect to host ubuntu1 port 22: Connection timed out
Now this is quite weird already given that I've properly configured the ssh to be password-less on both sides. Given this, I accessed the second machine and tried to start the worker manually using these commands:
./spark-class org.apache.spark.deploy.worker.Worker spark://ubuntu0:7707
./spark-class org.apache.spark.deploy.worker.Worker spark://<ip>:7707
However, below is the result:
14/05/23 13:49:08 INFO Utils: Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
14/05/23 13:49:08 WARN Utils: Your hostname, ubuntu1 resolves to a loopback address:
127.0.1.1; using 192.168.122.1 instead (on interface virbr0)
14/05/23 13:49:08 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
14/05/23 13:49:09 INFO Slf4jLogger: Slf4jLogger started
14/05/23 13:49:09 INFO Remoting: Starting remoting
14/05/23 13:49:09 INFO Remoting: Remoting started; listening on addresses :
[akka.tcp://[email protected]:42739]
14/05/23 13:49:09 INFO Worker: Starting Spark worker ubuntu1.local:42739 with 8 cores,
4.8 GB RAM
14/05/23 13:49:09 INFO Worker: Spark home: /home/ubuntu1/jaysonp/spark/spark-0.9.1
14/05/23 13:49:09 INFO WorkerWebUI: Started Worker web UI at http://ubuntu1.local:8081
14/05/23 13:49:09 INFO Worker: Connecting to master spark://ubuntu0:7707...
14/05/23 13:49:29 INFO Worker: Connecting to master spark://ubuntu0:7707...
14/05/23 13:49:49 INFO Worker: Connecting to master spark://ubuntu0:7707...
14/05/23 13:50:09 ERROR Worker: All masters are unresponsive! Giving up.
Below are the contents of my master and slave\worker spark-env.ssh:
SPARK_MASTER_IP=192.168.3.222
STANDALONE_SPARK_MASTER_HOST=`hostname -f`
How should I resolve this? Thanks in advance!
For those who are still encountering error(s) when it comes to starting workers on different machines, I just want to share that using IP addresses in conf/slaves worked for me. Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With