I currently have a system where the server tells all client applications when to next connect to the server between a server configured time window (say 12 to 6 am client time).
The current algorithm does a mod of the client's 10 digit ID number(fairly distributed) by the number of seconds in the time window and gives a pretty evenly distributed, predictable time for each client to connect to the server. The problem now, is that clients are in different time zones dis-proportionately, and certain time zones overlap for the given window, so the net effect is that the load is not distributed evenly on the server. What I would like is to devise an algorithm that I could configure with a percentage of clients we currently have for each time zone, and have it distribute the client's next connect time between the window that results in an even load on the server in a manner that is predictable (non-random).
here is a simple graphical representation:
12AM 1AM 2AM 3AM 4AM 5AM 6AM GMT
GMT -4 40% of the clients ||||||||||||||||||||||||||||||
GMT -5 10% of the clients ||||||||||||||||||||||||||||||
GMT -6 20% of the clients ||||||||||||||||||||||||||||||
GMT -7 30% of the clients ||||||||||||||||||||||||||||||
Break the problem into two parts: (1) determining what distribution you want each set of clients to have; and (2) deterministically assigning reconnect times that fit that distribution.
For problem (1), consider a two-dimensional array of numbers, much like the diagram you've drawn: each row represents a time zone and each column represents an equal period of time (an hour, perhaps) during the day. The problem you have to solve is to fill in the grid with numbers such that
This kind of problem has lots of solutions. You can find one by simulation without doing any hard math. Write a program that fills the grid in so that each time zone's clients are evenly distributed (that is, the way you're distributing them now) and then repeatedly moves clients horizontally from crowded times-of-day to less crowded ones.
For problem (2), you want a function that takes a ten-digit ID and a desired distribution (that is, one row of the matrix from problem 1 above), and deterministically produces a reconnect time. This is easily done by linear interpolation. Suppose the desired distribution is:
12:00 1:00 2:00 3:00 4:00 5:00 6:00 ...
+------+------+------+------+------+------+----
| 0 | 0 | 100 | 70 | 30 | 0 | ...
+------+------+------+------+------+------+----
First find the sum of the whole row, and scale the numbers up to the range of IDs. That is, divide by the sum and multiply by 1010.
12:00 1:00 2:00 3:00 4:00 5:00 6:00 ...
+------+------+-----------+-----------+-----------+------+----
| 0 | 0 | 500000000 | 350000000 | 150000000 | 0 | ...
+------+------+-----------+-----------+-----------+------+----
Now let x = the ten-digit ID, and read the row from left to right. At each box, subtract the value in that box from x. Keep going until the number in the box is greater than what's left in x. Return the time
(start time for this box) + (duration of this box) * x / (number in box)
Note that once you calculate the solution to problem (1), the reconnect times will be deterministic until the next time you recalculate the matrix. Then everyone's reconnect time will shift around a little--but not much, unless the matrix changes dramatically.
You could take into account the time zone of the user in addition to its ID.
One example solution that uses this would be the following:
There are 24 time zones. Calculate which relative load there is for each of the time zones. You can do this by summing the total number of clients from each time zone from your static data. Now you have "weighted time zones". Each time zone will get a time share proportional to its weight.
For example, if you have the following data (for simplicity, lets assume that there are only three time zones):
Time Zone | Clients num
------------------------
0 | 20
1 | 30
2 | 10
Then you would divide your time interval size by 60, and give each of the time zones its share of time: the first time zone will get (20/60 * #time), the second will get the following (30/60 * #time) etc.
Once you have the smaller time frames, you can tell each client of its time according to your previous function (i.e. mod for example), using the smaller interval according to what you calculated for its specific time zone.
Notes:
EDIT:
Given the example you added to your question, you could apply this method the following way:
If I understand you correctly, you have 10 hours in which your server is active, and you would like for the load to be more or less equal for each of these hours. Meaning: in each of these hours you would like that 10% of the clients would access the server. Using the idea explained above, it is possible to divide the users non uniformly, so that for each time zone, there are hours with "more probability", and hours with "less probability". In your example, in the GMT-4 group, 10%/40% of the clients will access the server in the first hour: 12AM-01AM GMT. It is possible to calculate the load for each of the time zones, so that the total load for the server in every hour is 10%. There are many methods to do this - a greedy one will do. Once you have this, you know the weights for each of the time zones, and it should be clearer how to use the time sharing method described above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With