I'm using Spark Standalone Mode tutorial page to install Spark in Standalone mode.
1- I have started a master by:
./sbin/start-master.sh
2- I have started a worker by:
./bin/spark-class org.apache.spark.deploy.worker.Worker spark://ubuntu:7077
Note: spark://ubuntu:7077
is my master name, which I can see it in Master-WebUI
.
Problem: By second command, a worker started successfully. But it couldn't associate with master. It tries repeatedly and then give this message:
15/02/08 11:30:04 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster@ubuntu:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: ubuntu/127.0.1.1:7077
15/02/08 11:30:04 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef: Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from Actor[akka://sparkWorker/user/Worker#-1296628173] to Actor[akka://sparkWorker/deadLetters] was not delivered. [20] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
15/02/08 11:31:15 ERROR Worker: All masters are unresponsive! Giving up.
What is the problem?
Thanks
To install Spark Standalone mode, you simply place a compiled version of Spark on each node on the cluster. You can obtain pre-built versions of Spark with each release or build it yourself.
is it possible to use the master node (the PC) as both master and slave in spark cluster? is it possible to have 2 slaves and 1 master node? Yes it is possible, you can configure it as both. There are so many links are available for it.
The master is the driver that runs the main() program where the spark context is created. It then interacts with the cluster manager to schedule the job execution and perform the tasks. 2. The worker consists of processes that can run in parallel to perform the tasks scheduled by the driver program.
In my case, using spark 2.4.7 in standalone mode, I've created a passwordless ssh key using ssh-keygen, but still got asked for worker password when starting the cluster.
What I did was follow the instructions here https://www.cyberciti.biz/faq/how-to-set-up-ssh-keys-on-linux-unix/
This line solved the problem: ssh-copy-id -i $HOME/.ssh/id_rsa.pub user@server-ip
I usually start from spark-env.sh template. And I set, properties that I need. For simple cluster you need:
Then, create a file called "slaves" in the same directory as spark-env.sh and slaves ip's (one per line). Assure you reach all slaves through ssh.
Finally, copy this configuration in every machine of your cluster. Then start the entire cluster executing start-all.sh script and try spark-shell to check your configuration.
> sbin/start-all.sh
> bin/spark-shell
You can set export SPARK_LOCAL_IP="You-IP" #to set the IP address Spark binds to on this node
in $SPARK_HOME/conf/spark-env.sh
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With