Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to flatten peak usage over time?

Tags:

c#

algorithm

I have an environment that serves many devices spread across 3 time zones by receiving and sending data during the wee hours of the night. The distribution of these devices was determined pseudo-randomly based on an identification number and a simple calculation using a modulo operation. The result of such a calculation creates an unnecessary artificial peak which consumes more resources than I'd like during certain hours of the night.

As part of our protocol I can instruct devices when to connect to our system on subsequent nights.

I am seeking an algorithm which can generally distribute the peak into a more level line (albeit generally higher at most times) or at least a shove in the right direction - meaning what sort of terminology should I spend my time reading about. I have available to me identification numbers for devices, the current time, and the device's time zone as inputs for performing calculation. I can also perform some up front analytical calculations to create pools from which to draw slots from, though I feel this approach may be less elegant than I am hoping for (though a learning algorithm may not be a bad thing...).

(Ultimately and somewhat less relevant I will be implementing this algorithm using C#.)

like image 318
cfeduke Avatar asked Nov 09 '09 22:11

cfeduke


3 Answers

If you want to avoid the spikes associated with using random times, look at the various hashing functions used for hashtables. Your reading might start at the wikipedia articles on the subject:

http://en.wikipedia.org/wiki/Hash_function

Basically, divide whatever you want your update window to be into the appropriate number of buckets. One option might be 3 hours * 60 minutes * 60 seconds = 10800 buckets. Then use that as your hashtable size, for the chosen hashing function. Your unique input might be device ID. Don't forget to use GMT for the chosen time. Your programming language of choice probably has a number of built in hashing functions, but the article should provide some links to get you started if you want to implement one from scratch.

This approach is superior to the earlier answer of random access times because it has much better evenness properties, and ensures that your access patterns will be approximately flat, as compared to the random function which is likely to sometimes exhibit spikes.

Here's some more specific information on how to implement various functions:

http://www.partow.net/programming/hashfunctions/index.html

like image 97
Paul McMillan Avatar answered Sep 26 '22 00:09

Paul McMillan


You say that you can tell devices what time to connect, so I don't see why you need anything random or modulused. When each device connects, pick a time tomorrow which currently doesn't have many devices assigned to it, and assign the device to that time. If the devices all take about the same amount of resources to service, then a trivial greedy algorithm will produce a completely smooth distribution - assign each device to whatever time is currently least congested. If the server handles other work than just these devices, then you'd want to start with its typical load profile, then add the device load to that. I wouldn't really call this "analytical calculations", just storing a histogram of expected load against time for the next 24 hours.

Or do you have the problem that the device might not obey instructions (for example it might be offline at its assigned time, and then connect whenever it's next on)? Obviously if your users in a particular time zone all start work at the same time in the morning, then that would be a problematic strategy.

like image 25
Steve Jessop Avatar answered Sep 27 '22 00:09

Steve Jessop


Simply take the number of devices and divide your time interval into n equal segments and allocate each segment to a device, informing them of when to connect when they next connect.

This will give you an optimally uniform distribution in all cases.

Normalize all times to GMT, what do you care about time zones or day light savings time or whatever? Now is now no matter what time zone you're in.

Adding a random distribution can lead to clumping (a uniform random distribution is only uniform in the limit, but not necessarily for any particular sample), and really should be used if there's no feedback mechanism. Since you can control to some extent when they connect a random component is not at all necessary and is not even remotely optimal.

If you're concerned about clock drift across devices consider even if you added randomness this wouldn't decrease the randomness of your clock drift in any way, and would only contribute to an even less optimal allocation.

If you want to ensure a stable distribution of devices by region, then compute the ratio of devices per region, and distribute the slot allocations appropriately. For instance, if you have 50/25/25 by time zone respectively, assign slots to the first time zone, then the next two slots to the remaining time zones, then repeat.

like image 26
groundhog Avatar answered Sep 26 '22 00:09

groundhog