I realized that the master spark becomes unresponsive when I kill the leader zookeeper (of course I assigned the leader election task to the zookeeper). The following is the error log that I see on Master Spark node. Do you have any suggestion to resolve it?
15/06/22 10:44:00 INFO ClientCnxn: Unable to read additional data from
> server sessionid 0x14dd82e22f70ef1, likely server has closed socket,
> closing socket connection and attempting reconnect
15/06/22 10:44:00
> INFO ClientCnxn: Unable to read additional data from server sessionid
> 0x24dc5a319b40090, likely server has closed socket, closing socket
> connection and attempting reconnect
15/06/22 10:44:01 INFO
> ConnectionStateManager: State change: SUSPENDED
15/06/22 10:44:01 INFO
> ConnectionStateManager: State change: SUSPENDED
15/06/22 10:44:01 WARN
> ConnectionStateManager: There are no ConnectionStateListeners
> registered.
15/06/22 10:44:01 INFO ZooKeeperLeaderElectionAgent: We
> have lost leadership
15/06/22 10:44:01 ERROR Master: Leadership has
> been revoked -- master shutting down.
Apache Spark is a unified analytics engine for large-scale data processing. Utilizing ZooKeeper to provide leader election and some state storage, you can launch multiple Masters in your cluster connected to the same ZooKeeper instance. One will be elected “leader” and the others will remain in standby mode.
Zookeeper provides us the mechanism for leader election where you can configure multiple masters in the cluster for HA purposes which will be connected to the same Zookeeper instance. One master instance will take the role of a master and others would be in the standby mode.
Just check http://master:8088 where master is pointing to spark master machine. There you will be able to see spark master URI, and by default is spark://master:7077, actually quite a bit of information lives there, if you have a spark standalone cluster.
This is the expected behaviour. You have to set up 'n' number of masters and you need to specify the zookeeper url in all the master env.sh
SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=zk1:2181,zk2:2181"
Note that zookeeper maintains quorum. This means you need to have odd number of zookeepers and only when the quorum is maintained zookeeper cluster will be up. Since spark depends up on zookeeper it implies that spark cluster will not be up until zookeeper quorum is maintained.
When you set up two(n) masters and bring down a zookeeper the current master will go down and the new master will be elected and all the worker nodes will be attached to the new master.
You should have started your worker by giving
./start-slave.sh spark://master1:port1,master2:port2
You have to wait for 1-2 minutes!! to notice this failover.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With