Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to connect master and slaves in Apache-Spark? (Standalone Mode)

Tags:

apache-spark

I'm using Spark Standalone Mode tutorial page to install Spark in Standalone mode.

1- I have started a master by:

./sbin/start-master.sh

2- I have started a worker by:

./bin/spark-class org.apache.spark.deploy.worker.Worker spark://ubuntu:7077

Note: spark://ubuntu:7077 is my master name, which I can see it in Master-WebUI.

Problem: By second command, a worker started successfully. But it couldn't associate with master. It tries repeatedly and then give this message:

15/02/08 11:30:04 WARN Remoting: Tried to associate with unreachable    remote address [akka.tcp://sparkMaster@ubuntu:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: ubuntu/127.0.1.1:7077
15/02/08 11:30:04 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef: Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from Actor[akka://sparkWorker/user/Worker#-1296628173] to Actor[akka://sparkWorker/deadLetters] was not delivered. [20] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
15/02/08 11:31:15 ERROR Worker: All masters are unresponsive! Giving up.

What is the problem?

Thanks

like image 791
Omid Ebrahimi Avatar asked Feb 08 '15 20:02

Omid Ebrahimi


People also ask

How do I run spark in standalone mode?

To install Spark Standalone mode, you simply place a compiled version of Spark on each node on the cluster. You can obtain pre-built versions of Spark with each release or build it yourself.

How many masters and slaves can a spark cluster have?

is it possible to use the master node (the PC) as both master and slave in spark cluster? is it possible to have 2 slaves and 1 master node? Yes it is possible, you can configure it as both. There are so many links are available for it.

What is master and worker node in spark?

The master is the driver that runs the main() program where the spark context is created. It then interacts with the cluster manager to schedule the job execution and perform the tasks. 2. The worker consists of processes that can run in parallel to perform the tasks scheduled by the driver program.


3 Answers

In my case, using spark 2.4.7 in standalone mode, I've created a passwordless ssh key using ssh-keygen, but still got asked for worker password when starting the cluster.

What I did was follow the instructions here https://www.cyberciti.biz/faq/how-to-set-up-ssh-keys-on-linux-unix/

This line solved the problem: ssh-copy-id -i $HOME/.ssh/id_rsa.pub user@server-ip

like image 122
Aya Avatar answered Sep 28 '22 03:09

Aya


I usually start from spark-env.sh template. And I set, properties that I need. For simple cluster you need:

  • SPARK_MASTER_IP

Then, create a file called "slaves" in the same directory as spark-env.sh and slaves ip's (one per line). Assure you reach all slaves through ssh.

Finally, copy this configuration in every machine of your cluster. Then start the entire cluster executing start-all.sh script and try spark-shell to check your configuration.

> sbin/start-all.sh
> bin/spark-shell
like image 25
gasparms Avatar answered Sep 28 '22 03:09

gasparms


You can set export SPARK_LOCAL_IP="You-IP" #to set the IP address Spark binds to on this node in $SPARK_HOME/conf/spark-env.sh

like image 34
nikk Avatar answered Sep 28 '22 03:09

nikk