Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Watches and Ephemeral node doesn't work when state of zookeeper changes automatically?

I have a very strange case with Python Kazoo library. What I am doing in my below code is -

As soon as I connect to Zookeeper using kazoo library, I create an ephemeral node and then keep a watch on some other node and then I keep on running the program forever in an infinite loop.. I have also added a listener to Zookeeper as well which will monitor the state as well.

Everything is working perfectly fine for me, ephemeral node is up, watch on my znode is also working fine...

Sometimes, I am seeing pretty weird behaviour because of connection interruptions or drop. As I mentioned above, I have added a listener to zookeeper which will monitor the state and I have a print statement as well.. I always see, those print statement getting printed out as Lost, Suspended , Connected, I believe because of connection interruptions and after that my ephemeral nodes dies up and my watch on the znode doesn't work either as well.

Below is my code which runs forever -

#!/usr/bin/python

from kazoo.client import KazooClient
from kazoo.client import KazooState
from kazoo.protocol.states import EventType


def watch_host(event):
    print event


def my_listener(state):
    if state == KazooState.LOST:
    # Register somewhere that the session was lost
        print "Lost"
    elif state == KazooState.SUSPENDED:
    # Handle being disconnected from Zookeeper
        print "Suspended"
    else:
    # Handle being connected/reconnected to Zookeeper
    # what are we supposed to do here?
    print "Being Connected/Reconnected"


zk = KazooClient(hosts='127.0.0.1:2181')
zk.start()

zk.add_listener(my_listener)

# start an ephemeral node
zk.create("/my/example/h0", b"some value", None, True)

# put a watch on my znode
children = zk.get_children("/my/example/test1", watch=watch_host)


while True:
    time.sleep(5)

Is there any way to overcome this problem? I want that whenever my Zookeeper state changes to Lost or Suspended or Connected. I want to have my ephemeral node up by creating it again (if this is the right approach) and my watch on the znode also be working as well always.

Because I will be running my program forever so for whatever reason if the Zookeeper state changes due to connection interruptions and it gets connected back again automatically, then I need to make sure my ephemeral node is also up and my watches on the znode also start working automatically..

Currently my ephemeral dies up and watches also doesn't work if the state is changing automatically..

Any idea how to overcome this problem?

like image 249
arsenal Avatar asked Nov 24 '13 05:11

arsenal


1 Answers

Here is the thing, when there is a state change in the connection, your watcher will also get triggered. There is an Event that is given to the Watcher. It can be something like nodeDataChanged or nodeChildrenChanged. However, since it would be impossible to be notified of an event you're interested in when your session is terminated or there is a connection issue, your watcher will get notified of these session issues. I believe the event type for this is "None."

From http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkWatches

Things to Remember about Watches

  • Watches are one time triggers; if you get a watch event and you want to get notified of future changes, you must set another watch.
  • Because watches are one time triggers and there is latency between getting the event and sending a new request to get a watch you cannot reliably see every change that happens to a node in ZooKeeper. Be prepared to handle the case where the znode changes multiple times between getting the event and setting the watch again. (You may not care, but at least realize it may happen.)
  • A watch object, or function/context pair, will only be triggered once for a given notification. For example, if the same watch object is registered for an exists and a getData call for the same file and that file is then deleted, the watch object would only be invoked once with the deletion notification for the file.
  • When you disconnect from a server (for example, when the server fails), you will not get any watches until the connection is reestablished. For this reason session events are sent to all outstanding watch handlers. Use session events to go into a safe mode: you will not be receiving events while disconnected, so your process should act conservatively in that mode.

So, long story short, your watcher should crack open the event to see what kind it is and respond appropriately to the None type by going into some kind of failover mode.

What I usually do is my Watcher objects are also listeners. When the reconnection happens, I respond by resetting my watches, making sure to check if the appropriate znodes are present and creating them when necessary.

like image 130
Matthew Daumen Avatar answered Oct 12 '22 23:10

Matthew Daumen