Scheduling Algorithm with limitations

Tags:

algorithm

Thanks to user3125280, D.W. and Evgeny Kluev the question is updated.

I have a list of webpages and I must download them frequently, each webpage got a different download frequency. Based on this frequency we group the webpages in 5 groups:

Items in group 1 are downloaded once per 1 hour
items in group 2 once per 2 hours
items in group 3 once per 4 hours
items in group 4 once per 12 hours
items in group 5 once per 24 hours

This means, we must download all the group 1 webpages in 1 hour, all the group 2 in 2 hours etc.

I am trying to make an algorithm. As input, I have:

a) DATA_ARR = one array with 5 numbers. Each number represents the number of items in this group.

b) TIME_ARR = one array with 5 numbers (1, 2, 4, 12, 24) representing how often the items will be downloaded.

b) X = the total number of webpages to download per hour. This is calculated using items_in_group/download_frequently and rounded upwards. If we have 15 items in group 5, and 3 items in group 4, this will be 15/24 + 3/12 = 0.875 and rounded is 1.

Every hour my program must download at max X sites. I expect the algorithm to output something like:

Hour 1: A1 B0 C4 D5
Hour 2: A2 B1 C2 D2
...

A1 = 2nd item of 1st group
C0 = 1st item of 3rd group

My algorithm must be as efficient as possible. This means:

a) the pattern must be extendable to at least 200+ hours
b) no need to create a repeatable pattern
c) spaces are needed when possible in order to use the absolute minimum bandwidth
d) never ever download an item more often than the update frequency, no exceptions

Example:

group 1: 0 items | once per 1 hour
group 2: 3 items | once per 2 hours
group 3: 4 items | once per 4 hours
group 4: 0 items | once per 12 hours
group 5: 0 items | once per 24 hours

We calculate the number of items we can take per hour: 3/2+4/4 = 2.5. We round this upwards and it's 3.

Using pencil and paper, we can found the following solution:

Hour 1: B0 C0 B1
Hour 2: B2 C1 c2
Hour 3: B0 C3 B1
Hour 4: B2
Hour 5: B0 C0 B1
Hour 6: B2 C1 c2
Hour 7: B0 C3 B1
Hour 8: B2
Hour 9: B0 C0 B1
Hour 10: B2 C1 c2
Hour 11: B0 C3 B1
Hour 12: B2
Hour 13: B0 C0 B1
Hour 14: B2 C1 c2
and continue the above.

We take C0, C1 C2, and C3 once every 4 hours. We also take B0, B1 and B2 once every 2 hours.

Question: Please, explain to me, how to design an algorithm able to download the items, while using the absolute minimum number of downloads? Brute force is NOT a solution and the algorithm must be efficient CPU wise because the number of elements can be huge.

You may read the answer posted here: https://cs.stackexchange.com/a/19422/12497 as well as the answer posted bellow by user3125280.

435

asked Dec 30 '13 04:12

Luka

1 Answers

You problem is a typical scheduling problem. These kinds of problems are well studied in computer science so there is a huge array of literature to consult.

The code is kind of like Deficit round robin, but with a few simplifications. First, we feed the queues ourself by adding to the data_to_process variable. Secondly, the queues just iterate through a list of values.

One difference is that this solution will get the optimal value you want, barring mathematical error.

~~Rough sketch: have not compiled (c++11)~~ unix based, to spec code

#include <iostream>
#include <vector>
#include <numeric>
#include <unistd.h>
//#include <cmath> //for ceil

#define TIME_SCALE ((double)60.0) //1 for realtime speed

//Assuming you are not refreshing ints in the real case
template<typename T>
struct queue
{
    const std::vector<T> data; //this will be filled with numbers
    int position;

    double refresh_rate; //must be refreshed ever ~ hours
    double data_rate; //this many refreshes per hour
    double credit; //amount of refreshes owed

    queue(std::initializer_list<T> v, int r ) :
        data(v), position(0), refresh_rate(r), credit(0) {
        data_rate = data.size() / (double) refresh_rate;
    }

    int getNext() {
        return data[position++ % data.size()];
    }
};

double time_passed(){
static double total;
//if(total < 20){ //stop early
    usleep(60000000 / TIME_SCALE); //sleep for a minute
    total += 1.0 / 60.0; //add a minute
    std::cout << "Time: " << total << std::endl;
    return 1.0; //change to 1.0 / 60.0 for real time speed
//} else return 0;
}

int main()
{
    //keep a list of the queues
    std::vector<queue<int> > queues{
    {{1, 2, 3}, 2},
    {{1, 2, 3, 4}, 3}};

    double total_data_rate = 0;
    for(auto q : queues) total_data_rate += q.data_rate;

    double data_to_process = 0; //how many refreshes we have to do
    int queue_number = 0; //which queue we are processing

    auto current_queue = &queues[0];

    while(1) {
        data_to_process += time_passed() * total_data_rate;
        //data_to_process = ceil(data_to_process) //optional

        while(data_to_process >= 1){
            //data_to_process >= 0 will make the the scheduler more
            //eager in the first time period (ie. everything will updated correctly
            //in the first period and and following periods
            if(current_queue->credit >= 1){
            //don't change here though, since credit determines the weighting only,
            //not how many refreshes are made
                //refresh(current_queue.getNext();
                std::cout << "From queue " << queue_number << " refreshed " <<
                current_queue->getNext() << std::endl;
                current_queue->credit -= 1;
                data_to_process -= 1;
            } else {
                queue_number = (queue_number + 1) % queues.size();
                current_queue = &queues[queue_number];
                current_queue->credit += current_queue->data_rate;
            }
        }
    }
   return 0;
}

The example should now compile on gcc with --std=c++11 and give you what you want.

and here is test case output: (for non-time scaled earlier code)

Time: 0
From queue 1 refreshed 1
From queue 0 refreshed 1
From queue 1 refreshed 2
Time: 1
From queue 0 refreshed 2
From queue 0 refreshed 3
From queue 1 refreshed 3
Time: 2
From queue 0 refreshed 1
From queue 1 refreshed 4
From queue 1 refreshed 1
Time: 3
From queue 0 refreshed 2
From queue 0 refreshed 3
From queue 1 refreshed 2
Time: 4
From queue 0 refreshed 1
From queue 1 refreshed 3
From queue 0 refreshed 2
Time: 5
From queue 0 refreshed 3
From queue 1 refreshed 4
From queue 1 refreshed 1

As an extension, to answer the repeating pattern problem by allowing this scheduler to complete only the first lcm(update_rate * lcm(...refresh rates...), ceil(update_rate)) steps, and then repeating the pattern.

ALSO: this will, indeed, be unsolvable sometimes because of the requirement on hour boundaries. When I use your unsolvable example, and modify time_passed to return 0.1, the schedule is solved with updates every 1.1 hours (just not at the hour boundaries!).

167

answered Oct 22 '22 10:10

user3125280

Related questions
                            
                                What are good algorithms for detecting abnormality?
                            
                                Finding the most common three-item sequence in a very large file
                            
                                Performance with time related algorithm
                            
                                Algorithm to minimize vertex distances - Dwarf Fortress
                            
                                Place "sum" and "multiply" operators between the elements of a given list of integers so that the expression results in a specified value
                            
                                What is that optimization algorithm called?
                            
                                Fair partitioning of elements of a list
                            
                                Can a non binary tree be tranversed in order?
                            
                                What is the most efficient pattern/algorithm to compare two lists and find the delta between those two lists?
                            
                                Clustering with a distance matrix
                            
                                Why doesn't this implementation of Jarvis' March ("Gift wrapping algorithm") work?
                            
                                "Programming Pearls" binary search help
                            
                                Hotel room optimization/sorting algorithm
                            
                                Brent's cycle detection algorithm
                            
                                Given a number, find whether it is brilliant or not
                            
                                In-order traversal complexity in a binary search tree (using iterators)?
                            
                                Finding minimum cost in a binary matrix
                            
                                Number of Distinct Subarrays
                            
                                Graph Implementations: why not use hashing?
                            
                                python prime factorization performance

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With