Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any good algorithm for detecting the clients which stopped sending heartbeats to server?

Suppose that millions of clients are sending heartbeats to one server within a fixed interval, so server judges that the client stopped sending heartbeats for a time larger than the interval is failed. If server just maintains a map and keeps iterating every client to check if the client is time out, that will produce a O(n) complexity, which is horrible to millions of clients. Are there some algorithms to make a O(log(n)) complexity just like some heaps or binary trees to solve problems like this? Thanks.

like image 853
rainer rainer Avatar asked May 26 '15 09:05

rainer rainer


3 Answers

If you keep a double linked list and a hashmap, you could use the hashmap to find the list entry that corresponds to the client that sends a heartbeat. After receiving the heartbeat, you unchain the list entry and put it to the end of the linked list.

The entry at the beginning of the linked list is a candidate "died" client. The linked list can be traversed up to the first client alive.

Some considerations:

  1. Add a client: create a list entry, add it to the hashmap and to the end of the linked list. O(1)

  2. Remove a client: find the client list entry by a lookup in the hashmap. Unchain the client. remove from the hashmap. O(1).

  3. Process a heartbeat: find the client list entry by a lookup in the hashmap. unchain the client list entry. Add the list entry to the end of the linked list. O(1)

  4. Find a dead client: Look at the first entry of the linked list. O(1)

like image 193
Ronald Avatar answered Sep 20 '22 11:09

Ronald


Approach 1

The simplest approach that will give you O(log N) performance is to use priority queue (a heap). In this queue, for each client you keep a pair <last-heartbeet-time, client-id>, with last-heartbeet-time being the UTC time when you last received a heartbeet from that client, and making sure that the elements with smaller (earlier) last-heartbeet-time are on the queue top.

Each time a client sends a heartbeet, you change the value in the corresponding pair, thus moving the client into a later position in the queue. And when you want to remove non-active clients, you just extract data from the queue until you reach clients that have last-heartbeet-time more than your threshold.

Update is O(log N) per heartbeet (see here or here under "Index priority queue" for more details), removal is O(log N) per removed client. Note that this approach does not require to iterate over all clients to look for those that should be removed. It iterates only over those that are actually removed. So if you have N clients, M heartbeet requests in total, and K removal in total, then this approach will process them in O(M log N + K log N) time.

UPD: this solution can also handle the situation if your clients have different time-to-live, that is if the heartbeets from different clients have different expiration time. You just need to store not the last-heartbeet-time, but the expiration-time-of-last-heartbeet.

UPD2: please see Ronald's answer for a similar idea, but not using a heap and achieving O(1) on each iteration — for the case all your clients have the same time-to-live it is better than above (you do not need a heap because every time you update some value, it moves to the end of queue).

Approach 2

Another approach can be as following. Keep an ordinary queue (FIFO) for all the heartbeets that you have received, and a separate array where for each client you store how much heartbeets of that client are already in the queue.

Every time a new heartbeet arrives, you push it into the queue and increase the number of heartbeets for that client.

Every time you want to remove inactive clients, you start popping data from the queue. You will be popping heartbeets in the order they have arrived, starting from the oldmost. Every time you pop a heartbeet, you decrease the corresponding value in the array and check whether this value has become zero. If if has become zero, then this client is inactive. You continue pops until you come to young enough heartbeets.

This might seem slow, but note that every heartbeet is popped only once, so the total number of pops will not be bigger that the total number of heartbeets receibed, hence the total complexity of this solution is just O(M)! You cannot have a faster solution just because you need to process all the incoming heartbeets, which is already O(M), so if your application can handle the incoming flux of heartbeets, it should also be able to handle this solution.

A pseudocode for this solution is below

def receive_heartbeet(hb)
    q.push(hb)
    nActive[hb.client]++

def find_inactive
    while q.front().time < currentTime - threshold
        hb = q.pop()
        nActive[hb.client]--
        if nActive[hb.client] == 0
            mark hb.client as inactive
like image 39
Petr Avatar answered Sep 19 '22 11:09

Petr


I would suggest an approach slightly different from the first approach suggested by @Petr.

You would need a HashMap<client, FIFO<heartbeatDate>> clientHeartbeats and a Heap<heartbeatDate, client> heartbeats which is ordered in an ascending manner by the heartbeatDate (the first element is the earliest heartbeat).

When a client sends a heartbeat you insert it in heartBeats (time complexity = O(log(n))) and in clientHeartbeats (time complexity = O(1)).

When the first heartbeat in heartbeats is out of the given time interval you just remove it from the heap (time complexity = O(log(n))) and remove the heartbeat from clientHeartbeats[removed heartbeat's client] (time complexity = O(1)). If the list becomes empty then that particular client has not sent a heartbeat in the indicated time interval.

Let me know if anything is unclear.

like image 35
Gentian Kasa Avatar answered Sep 19 '22 11:09

Gentian Kasa