Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Non-Random Weighted Distribution

Tags:

c#

algorithm

I currently have a system where the server tells all client applications when to next connect to the server between a server configured time window (say 12 to 6 am client time).

The current algorithm does a mod of the client's 10 digit ID number(fairly distributed) by the number of seconds in the time window and gives a pretty evenly distributed, predictable time for each client to connect to the server. The problem now, is that clients are in different time zones dis-proportionately, and certain time zones overlap for the given window, so the net effect is that the load is not distributed evenly on the server. What I would like is to devise an algorithm that I could configure with a percentage of clients we currently have for each time zone, and have it distribute the client's next connect time between the window that results in an even load on the server in a manner that is predictable (non-random).

here is a simple graphical representation:

                            12AM 1AM 2AM 3AM 4AM 5AM 6AM GMT
GMT -4  40% of the clients  ||||||||||||||||||||||||||||||            
GMT -5  10% of the clients       ||||||||||||||||||||||||||||||        
GMT -6  20% of the clients           ||||||||||||||||||||||||||||||    
GMT -7  30% of the clients               ||||||||||||||||||||||||||||||
like image 340
John Lemp Avatar asked Nov 20 '09 14:11

John Lemp


2 Answers

Break the problem into two parts: (1) determining what distribution you want each set of clients to have; and (2) deterministically assigning reconnect times that fit that distribution.

For problem (1), consider a two-dimensional array of numbers, much like the diagram you've drawn: each row represents a time zone and each column represents an equal period of time (an hour, perhaps) during the day. The problem you have to solve is to fill in the grid with numbers such that

  • the total of each row is the number of clients in that time zone;
  • for each row, all the numbers outside that time zone's reconnect window are zero;
  • the sums of the columns do not exceed some predetermined maximum (and are as evenly balanced as possible).

This kind of problem has lots of solutions. You can find one by simulation without doing any hard math. Write a program that fills the grid in so that each time zone's clients are evenly distributed (that is, the way you're distributing them now) and then repeatedly moves clients horizontally from crowded times-of-day to less crowded ones.

For problem (2), you want a function that takes a ten-digit ID and a desired distribution (that is, one row of the matrix from problem 1 above), and deterministically produces a reconnect time. This is easily done by linear interpolation. Suppose the desired distribution is:

12:00    1:00   2:00   3:00   4:00   5:00   6:00 ...
  +------+------+------+------+------+------+----
  |    0 |    0 |  100 |   70 |   30 |    0 |   ...
  +------+------+------+------+------+------+----

First find the sum of the whole row, and scale the numbers up to the range of IDs. That is, divide by the sum and multiply by 1010.

12:00    1:00   2:00        3:00       4:00        5:00   6:00 ...
  +------+------+-----------+-----------+-----------+------+----
  |    0 |   0  | 500000000 | 350000000 | 150000000 |    0 |   ...
  +------+------+-----------+-----------+-----------+------+----

Now let x = the ten-digit ID, and read the row from left to right. At each box, subtract the value in that box from x. Keep going until the number in the box is greater than what's left in x. Return the time

(start time for this box) + (duration of this box) * x / (number in box)

Note that once you calculate the solution to problem (1), the reconnect times will be deterministic until the next time you recalculate the matrix. Then everyone's reconnect time will shift around a little--but not much, unless the matrix changes dramatically.

like image 73
Jason Orendorff Avatar answered Sep 23 '22 14:09

Jason Orendorff


You could take into account the time zone of the user in addition to its ID.

One example solution that uses this would be the following:

There are 24 time zones. Calculate which relative load there is for each of the time zones. You can do this by summing the total number of clients from each time zone from your static data. Now you have "weighted time zones". Each time zone will get a time share proportional to its weight.

For example, if you have the following data (for simplicity, lets assume that there are only three time zones):

Time Zone | Clients num
------------------------
    0     |     20
    1     |     30
    2     |     10

Then you would divide your time interval size by 60, and give each of the time zones its share of time: the first time zone will get (20/60 * #time), the second will get the following (30/60 * #time) etc.

Once you have the smaller time frames, you can tell each client of its time according to your previous function (i.e. mod for example), using the smaller interval according to what you calculated for its specific time zone.

Notes:

  1. Obviously, you will need some minimum clients num value for time zones that are very low on traffic, but this is simple - you just edit the original table.
  2. This is one example of "time division", you can modify this example to your need, for example you could have mutual time frames for several time zones.

EDIT:

Given the example you added to your question, you could apply this method the following way:

If I understand you correctly, you have 10 hours in which your server is active, and you would like for the load to be more or less equal for each of these hours. Meaning: in each of these hours you would like that 10% of the clients would access the server. Using the idea explained above, it is possible to divide the users non uniformly, so that for each time zone, there are hours with "more probability", and hours with "less probability". In your example, in the GMT-4 group, 10%/40% of the clients will access the server in the first hour: 12AM-01AM GMT. It is possible to calculate the load for each of the time zones, so that the total load for the server in every hour is 10%. There are many methods to do this - a greedy one will do. Once you have this, you know the weights for each of the time zones, and it should be clearer how to use the time sharing method described above.

like image 41
Anna Avatar answered Sep 22 '22 14:09

Anna