Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Standalone Cluster - Slave not connecting to Master

Tags:

I am trying to setup a Spark standalone cluster following the official documentation.

My master is on a local vm running ubuntu and I also have one worker running in the same machine. It is connecting and I am able to see its status in the WebUI of the master.

Here is the WebUi image -

enter image description here

But when I try to connect a slave from another machine, I am not able to do it.

This is the log message I get in the worker when I start from another machine. I have tried using start-slaves.sh from the master after updating conf\slaves and also start-slave.sh spark://spark:7077 from the slave.

[Master hostname - spark; Worker hostanme - worker]

15/07/01 11:54:16 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster@spark:7077] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://sparkMaster@spark:7077]]. 15/07/01 11:54:59 ERROR Worker: All masters are unresponsive! Giving up. 15/07/01 11:54:59 INFO Utils: Shutdown hook called 

When I try to telnet from the slave to the master, this is what I get -

root@worker:~# telnet spark 7077 Trying 10.xx.xx.xx... Connected to spark. Escape character is '^]'. Connection closed by foreign host. 

Telnet seems to work but the connection is closed as soon as it is established. Could this have something to do with the problem ?

I have added the master and slave IP addresses in /etc/hosts on both machines. I followed all the solutions given at SPARK + Standalone Cluster: Cannot start worker from another machine but they have not worked for me.

I have the following config set in spark-env.sh in both machines -

export SPARK_MASTER_IP=spark

export SPARK_WORKER_PORT=44444

Any help is greatly appreciated.

like image 353
Mor Eru Avatar asked Jul 01 '15 16:07

Mor Eru


People also ask

How do I start master and slave in spark?

sbin/start-master.sh - Starts a master instance on the machine the script is executed on. sbin/start-slaves.sh - Starts a slave instance on each machine specified in the conf/slaves file. sbin/start-slave.sh - Starts a slave instance on the machine the script is executed on.

What happens if master node fails in spark?

So, yes, failing on master will result in executors not able to communicate with it. So, they will stop working. Failing of master will make driver unable to communicate with it for job status. So, your application will fail.

Is spark a slave master?

The Apache Spark framework uses a master-slave architecture that consists of a driver, which runs as a master node, and many executors that run across as worker nodes in the cluster.


2 Answers

I encounter the exact same problem as you and just figure out how to get it to work.

The problem is that your spark master is listening on hostname, in your example spark, which causes the worker on the same host being able to register successfully but failed from another machine with command start-slave.sh spark://spark:7077.

The solution is to make sure the value SPARK_MASTER_IP is specified with ip in file conf/spark-env.sh

    SPARK_MASTER_IP=<your host ip> 

on your master node, and start your spark master as normal. You can open your web GUI to make sure your spark master appears as spark://YOUR_HOST_IP:7077 after the start. Then, on another machine with command start-slave.sh spark://<your host ip>:7077 should start and register worker to master successfully.

Hope it would help you

like image 130
user1600668 Avatar answered Nov 03 '22 22:11

user1600668


Its depends on your spark version, it will need different conf. if your spark version 1.6 add this line to conf/spark-env.shso another machine can connect to master

SPARK_MASTER_IP=your_host_ip

and if your spark version is 2.x add these lines to your conf/spark-env.sh

SPARK_MASTER_HOST=your_host_ip

SPARK_LOCAL_IP=your_host_ip

after adding these lines run spark :

./sbin/spark-all.sh

and if you do right , you can see in <your_host_ip>:8080 that spark master url is:spark://<your_host_ip>:7077

BeCarefule your_host_ip ,shouldnt be localhost and It must be exactly Your host ip that you set in conf/spark-env.sh

after all you can connect another machine to the master with command below:

./sbin/start-slave.sh spark://your_host_ip:7077

like image 26
Hamid Avatar answered Nov 04 '22 00:11

Hamid