Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to find optimal groups

A device contains an array of locations, some of which contain values that we want to read periodically.

Our list of locations that we want to read periodically also specifies how often we want to read them. It is permitted to read a value more frequently than specified, but not less frequently.

A single read operation can read a contiguous sequence of locations from the array, so it is possible to return a group of multiple values from one read operation. The maximum number of contiguous locations that can be read in a single operation is M.

The goal is to group locations so as to minimize the time-averaged number of read operations. In the event that there is more than one way to do this, the tie-breaker is to minimize the time-averaged number of locations read.

(Bonus points are awarded if the algorithm to do this allows incremental changes to the list of locations - i.e. adding or removing one location to/from the list doesn't require the groupings to be recalculated from scratch!)

I'll try to clarify this with some examples where M=6.

The following diagram shows the array of locations. The numbers represent the desired read period for that location.

| 1 | 1 |   |   | 1 |   |   |   |   |   | 5 |   | 2 |
\-------------------/                   \-----------/
     group A                               group B

In this first example group A is read every second and group B every 2 seconds. Note that the location that should be read every 5 seconds is actually read every 2 seconds - which is fine.

| 1 |   |   |   |   | 1 | 1 |   | 1 |
\-----------------------/\----------/
         group A            group B         (non-optimal!)

This example shows a failure of my initial simple-minded algorithm, which was to fill up the first group until full and then start another. The following grouping is more optimal because although the number of group reads per second is the same, the number of locations read in those groups is smaller:

| 1 |   |   |   |   | 1 | 1 |   | 1 |
\---/               \---------------/
group A                  group B            (optimal)

Finally, an example where three groups is better than two:

| 5 |   |   |   |   | 1 | 1 |   |   |   |   | 5 |
\-----------------------/\----------------------/
        group A                  group B    (non-optimal)

This solution requires two group reads per second. A better solution is as follows:

| 5 |   |   |   |   | 1 | 1 |   |   |   |   | 5 |
\---/               \-------/               \---/
group A              group B               group C

This requires two reads every 5 seconds (groups A and C) plus one every second (group B): 1.4 group reads per second.

Edit: (There is an even better solution to this example if you allow reads to be non-periodic. On the 1st second, read both groups of the first solution. On seconds 2, 3, 4 and 5 read group B of the second solution. Repeat. This results in 1.2 group reads per second. But I'm going to disallow this because it would make the code responsible for scheduling the reads much more complicated.)

I looked up clustering algorithms but this isn't a clustering problem. I also found Algorithm to allocate a list of numbers to N groups under certain condition, which pointed to the 'Bin packing' problem, but I don't think this is it either.

By the way, sorry for the vague title. I can't think of a concise description, or even relevant search keywords!

New examples added 28 September 2010:

This is like the previous example, but all items updating at the same rate. Now two groups is better than three:

| 1 |   |   |   |   | 1 | 1 |   |   |   |   | 1 |
\-----------------------/\----------------------/
        group A                  group B          (optimal)

I've started trying to see how iterative improvements might be implemented. Suppose a grouping algorithm came up with:

| 1 |   |   |   |   | 1 | 1 |   |   |   |   | 1 | 1 |   |   |   |   | 1 |
\---/               \-------/               \-------/               \---/
group A              group B                 group C               group D  (non-optimal)
\-----------------------/\----------------------/\----------------------/
        group A                  group B                  group C           (optimal)

This can be improved to three adjacent groups each of 6. Rex suggested (comment below) that I could try combining triplets into pairs. But in this case I would have to combine a quartet into a triplet, because there is no legal intermediate arrangement in which A+B+C (or B+C+D) can be rearranged into a pair leaving D as it is.

I originally thought that this was an indication that in the general case there is no guarantee that a new valid solution can be created from an existing valid solution by making a local modification. This would have meant that algorithms such as simulated annealing, genetic algorithms, etc, could be used to try to refine a suboptimal solution.

But Rex pointed out (comment below) that you can always split an existing group into two. Despite the fact that this always increases the cost function, all that means is that the solution needs to get out of its local minimum in order to reach the global minimum.

like image 696
Ian Goldby Avatar asked Sep 14 '10 13:09

Ian Goldby


1 Answers

This problem has the same property of instability on addition of new items that similar NP-complete problems do, so I assume it is one also. Since I suspect that you want something that works reasonably well instead of a proof of why it's hard, I'll focus on an algorithm to give an approximate solution.

I would solve this problem by converting this into a graph where bins were valued at 1/N if they had to be read N times per second, and blur the graph with a width of M (e.g. 6), peaked at the original. (For 6, I might use weighting (1/6 1/5 1/4 1/3 1/2 1 1/2 1/3 1/4 1/5 1/6).) Then throw bins at all the local maxima (sort pairs by distance apart and cover close pairs of maxima first if you can). Now you'll have most of your most important values covered. Then catch any missing groups by extending the existing reads, or by adding new reads if necessary. Depending on the structure you may want to add some refinement by shifting locations between reads, but if you're lucky that won't even be necessary.

Since this is essentially a local algorithm, if you keep track of the blurred graph, you can fairly easily add new items and re-do the peak-covering locally (and the refinement locally).

Just to see how this would work on your data, the two-group case would look like (multiplying by 60 so I don't have to keep track of fractional weights)

 60 30 20 15 12 10 00 00 00   <- contribution from left-most location
 10 12 15 20 30 60 30 20 15   <- second
 00 10 12 15 20 30 60 30 20   <- third
 00 00 00 10 12 15 20 30 60   <- rightmost
 --------------------------
 70 42 47 50 74 B5 B0 80 95   (using "B" to represent 11)
 ^^             ^^       ^^   Local maxima
   -------------  -------
     dist=6        dist=4
               |===========|  <- Hit closely-spaced peaks first
|==|                          <- Then remaining

So we're done, and the solution is optimal.

For the three group example, weighting "5" as "1/5" and multiplying everything by 300 so again there are no fractions,

060 030 020 015 012 010 000 000 000 000 000 000   <- from 5 on left
050 060 075 100 150 300 150 100 075 060 050 000   <- 1 on left
000 050 060 075 100 150 300 150 100 075 060 050   <- on right
000 000 000 000 000 000 010 012 015 020 030 060   <- 5 on right
-----------------------------------------------
110 140 155 190 262 460 460 262 190 155 140 110
                   |=======|                      <- only one peak, grab it
===                                         ===   <- missed some, so pick them back up
like image 97
Rex Kerr Avatar answered Nov 15 '22 20:11

Rex Kerr