Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

whats the best way to test multiple boxes for ping quickly

I have an application where I monitor and control a bunch of computers (probably 3 to 35 or so, probably local).

One of the things I monitor is uptime/ping status. One of the application's purposes is to restart the boxes, sometimes they restart for other reasons.

I'd like to be able pick up the pingable/non-pingable changes quickly.

I have a spin loop on a thread.

It seems to me that blocking ping prevents it from updating for a bit even if you run it in parallel(prevent one box's ping from blocking another)

(parallel implementation example, note the following is just of the top of my head and hasn't been implemented, may contain errors)

var startTime = DateTime.Now;
var period = TimeSpan.FromSeconds();
Parallel.ForEach(boxes, (box) => 
{
    var now = DateTime.Now;
    var remainingTime = (now - startTime) - period;
    while(remainingTime > TimeSpan.Zero)
    {
        box.CanPing.TryUpdate();
    }
});

where TryUpdate is just something like

using(ping = new Ping())
{
    var reply = ping.Send (IP);
    bool upStatus = (reply.Status == IPStatus.Success);
    this.Value = upStatus;
}

Alternatively I tried using multiple SendAsync (multiple async pings at one time) to discover uptime as quickly as possible with Double-checked locking in the callback to SendAsync

if(upStatus != this.Value)
{
    lock(_lock)//is it safe to have a non static readonly lock object, all the examples seem to use a static object but that wouldn't scale to  locking in multiple instances of the containing class object
    {
        if(upStatus != this.Value)
        {
            ...
        }
    }
}

it was an awful memory leak but that may be because I'm making too many async ping calls (which each come with a thread) too quickly, and not disposing of ping. If I limit myself to 3 per computers at a time, or put a longer pause in the middle, and Dispose() the ping do you think it would be a good idea?

What's the better strategy? Any other ideas?

like image 645
Roman A. Taycher Avatar asked Jun 29 '13 09:06

Roman A. Taycher


2 Answers

This is a specific case of multithreading, where you do not need the treads to make the program faster, you need to make it more responsive. Your operations take little to none computing power. Therefore I would not be scared to create a single thread for each monitored computer. They are going to be doing sleep() most of the time anyway. They should be created once, because the thread creation is actually the most expensive thing here.

I would create object hierarchy like that:

  • GUIProxy - would handle all gui operations, like changing notification colors next to coputer's name
  • HostManager - would register new machines, remove old, perform timing checks on Monitors
  • HostMonitor - would periodically, sequentially send pings to check computers. More on it's behavior later

Checking algorithm

In LANs most of the time pings return within 1-2 ms of sending. Over the Internet the time may vary. I would have two ping-time thresholds set separately for each Monitor, depending on the machine location. One would be a "warning" threshold (yellow light or sth in GUI) when the LAN ping is greater than 5ms or Internet ping > 200ms. The second would be an "error" threshold, with LAN>1s and Internet >2s or sth. Each Monitor would send ping, wait for an answer, and send another ping after receiving an answer. It should store lastPingSendTime, lastPingReceiveTime and currentPingSendTime. The former are for determining latency, the latter is for checking the delay in HostManager. Of course the Monitor should handle timeouts and other system/network events properly.

In HostManager, also running on a single thread, I would check the currentPingSendTime on each Monitor and check it against that monitor's thresholds. If a threshold is crossed, the GUIProxy would be notified to show the situation in GUI.

Advantages

  • you control threads yourself
  • you can use synchronous(simpler) ping method
  • Manager will not hang, since it accesses Monitors asynchronously
  • You can implement an abstract Monitor interface which you could use to monitor other things, not only computers

Disadvantages

  • correct Monitor threading implementation may not be simple
like image 142
Dariusz Avatar answered Sep 29 '22 08:09

Dariusz


Depending if you require a scale out solution you could implement the state checking like Dariusz said (which is an absolute legitimate Approach).

This Approach has only one disadvantage which may or may not be relevant in your Scenario: scaling up to hundrets or even thousands of monitored boxes or Services will result in a huge amount of threads. regarding the fact that .net applications in 64bit mode Support several thousand concurrent threads i would not recomment to spawn that much workers. the resource Scheduler won't be your best friend anymore if you give him the Job to schedule such a huge amount of workers.

In order to get a scale out capable solution it's a Little more difficult. Let's get shortly back to the original Problem: You want to Monitor a bunch of boxes quickly and pipelined processing is not performing well. Concering that you may Monitor other Services (tcp) in future also waiting for timeouts would kill this Approach completely.

Solution: Custom thread pooling or thread reusage

As you're dealing with a Special sort of threading which is influenced by the time a thread is spawned from the Default thread pool a solution is required to get rid of the spawning issue. having in mind to be able to scale out i would recommend this way:

Use a custom or the Default thread pool to spawn up several threads which are in suspended state. Now you're System wants to measure several boxes. Therefore: Get to you prewarmed threads and take the first suspended / free one and reserve it for your Monitoring Job. After you gained the thread for your usage you give him some sort of handle to your actual worker method (which will be invoked asynchronously by the thread). After the Monitoring Iteration has been finished (which may take some time) the threads Returns the result (good way would be a callback) and sets himself into suspended mode.

So this is just a custom Scheduler with prewarmed threads. If you're building the suspend/resume with ManualResetEvents the threads are available nearly instantly.

Still want more Performance?

If you're still gaining for a Little more Performance and want be able to tune your resulting System in a more granular manner i would recommend specialized thread pools (like zabbix does it for Monitoring). So you don't just assign a bunch of threads which may invoke a custom method to check if a box is reachable via ping or tcp, you assign a seperate pool per Monitoring type. So in case of icmp (ping) and tcp Monitoring you would create at least two thread pools where the threads contain already the Basic knowlege about "how to check". In case of a ping Monitor the thread would be ready and wait with a initialized ping instance which is just waiting for a target to check. When you take the thread from suspended state it immediately checks the host and Returns the result. afterwards it prepares for sleep (and in this case initializes the Environment for the next run already). If you're implementing this in a good way you can even reuse resources like sockets.

All in all this Approach enables you to Monitor 3, 35 or even hundrets of boxes without getting into Trouble. Of course Monitoring is still limited and you shouldn't fork thousands of prewarmed threads. that's not the idea behind: the idea is that you've defined Maximum numbers of threads which are ready for use and just waiting to get destinations to check. You don't have to deal with forking issues when initiating a Monitoring for many Hosts - you just have to deal with queuing if you're Monitoring more than you defined concurrency allows (and this may be much higher than Parallel.ForEach which by Default spawns Maximum one thread per core! Check the overloads of the method to increase this amount.)

Absolute optimization

If you're still willing to improve the System furthermore get your Scheduler and resource planner not just a Count of prewarmend threads. give him limitations like min 4, max 42 threads. the Scheduler takes starting and stopping additional threads within These borders into account. This is useful if your System decreases Monitoring rates over night and you don't want the suspended threads to hang around.

This would be A+ implementation as you wouldn't just be able to start Monitoring from cold state immediately for at least some Hosts and quickly for many Hosts - you would also give back resources you really don't Need for Long times.

like image 29
Daniel Nachtrub Avatar answered Sep 29 '22 08:09

Daniel Nachtrub