Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scheduling Algorithm with limitations

Tags:

algorithm

Thanks to user3125280, D.W. and Evgeny Kluev the question is updated.

I have a list of webpages and I must download them frequently, each webpage got a different download frequency. Based on this frequency we group the webpages in 5 groups:

Items in group 1 are downloaded once per 1 hour
items in group 2 once per 2 hours
items in group 3 once per 4 hours
items in group 4 once per 12 hours
items in group 5 once per 24 hours

This means, we must download all the group 1 webpages in 1 hour, all the group 2 in 2 hours etc.

I am trying to make an algorithm. As input, I have:

a) DATA_ARR = one array with 5 numbers. Each number represents the number of items in this group.

b) TIME_ARR = one array with 5 numbers (1, 2, 4, 12, 24) representing how often the items will be downloaded.

b) X = the total number of webpages to download per hour. This is calculated using items_in_group/download_frequently and rounded upwards. If we have 15 items in group 5, and 3 items in group 4, this will be 15/24 + 3/12 = 0.875 and rounded is 1.

Every hour my program must download at max X sites. I expect the algorithm to output something like:

Hour 1: A1 B0 C4 D5
Hour 2: A2 B1 C2 D2
...

A1 = 2nd item of 1st group
C0 = 1st item of 3rd group

My algorithm must be as efficient as possible. This means:

a) the pattern must be extendable to at least 200+ hours
b) no need to create a repeatable pattern
c) spaces are needed when possible in order to use the absolute minimum bandwidth
d) never ever download an item more often than the update frequency, no exceptions


Example:

group 1: 0 items | once per 1 hour
group 2: 3 items | once per 2 hours
group 3: 4 items | once per 4 hours
group 4: 0 items | once per 12 hours
group 5: 0 items | once per 24 hours

We calculate the number of items we can take per hour: 3/2+4/4 = 2.5. We round this upwards and it's 3.

Using pencil and paper, we can found the following solution:

Hour 1: B0 C0 B1
Hour 2: B2 C1 c2
Hour 3: B0 C3 B1
Hour 4: B2
Hour 5: B0 C0 B1
Hour 6: B2 C1 c2
Hour 7: B0 C3 B1
Hour 8: B2
Hour 9: B0 C0 B1
Hour 10: B2 C1 c2
Hour 11: B0 C3 B1
Hour 12: B2
Hour 13: B0 C0 B1
Hour 14: B2 C1 c2
and continue the above.

We take C0, C1 C2, and C3 once every 4 hours. We also take B0, B1 and B2 once every 2 hours.


Question: Please, explain to me, how to design an algorithm able to download the items, while using the absolute minimum number of downloads? Brute force is NOT a solution and the algorithm must be efficient CPU wise because the number of elements can be huge.

You may read the answer posted here: https://cs.stackexchange.com/a/19422/12497 as well as the answer posted bellow by user3125280.

like image 435
Luka Avatar asked Dec 30 '13 04:12

Luka


People also ask

What is the limitation of Round Robin scheduling algorithm?

Disadvantages of Round Robin AlgorithmLow slicing time reduces processor output. Spends more time on context switching. Performance depends on time quantum. Processes don't have priorities.

What are 4 major scheduling algorithms?

Six types of process scheduling algorithms are: First Come First Serve (FCFS), 2) Shortest-Job-First (SJF) Scheduling, 3) Shortest Remaining Time, 4) Priority Scheduling, 5) Round Robin Scheduling, 6) Multilevel Queue Scheduling.

What is the drawback of SJF scheduling?

Disadvantages of SJF Used for long-term scheduling in a batch system. Can't implement this algorithm for CPU scheduling for the short term as we can't predict the length of the upcoming CPU burst. Cause extremely long turnaround times or starvation. Knowledge about the runtime length of a process is necessary.


1 Answers

You problem is a typical scheduling problem. These kinds of problems are well studied in computer science so there is a huge array of literature to consult.

The code is kind of like Deficit round robin, but with a few simplifications. First, we feed the queues ourself by adding to the data_to_process variable. Secondly, the queues just iterate through a list of values.

One difference is that this solution will get the optimal value you want, barring mathematical error.

Rough sketch: have not compiled (c++11) unix based, to spec code

#include <iostream>
#include <vector>
#include <numeric>
#include <unistd.h>
//#include <cmath> //for ceil

#define TIME_SCALE ((double)60.0) //1 for realtime speed

//Assuming you are not refreshing ints in the real case
template<typename T>
struct queue
{
    const std::vector<T> data; //this will be filled with numbers
    int position;

    double refresh_rate; //must be refreshed ever ~ hours
    double data_rate; //this many refreshes per hour
    double credit; //amount of refreshes owed

    queue(std::initializer_list<T> v, int r ) :
        data(v), position(0), refresh_rate(r), credit(0) {
        data_rate = data.size() / (double) refresh_rate;
    }

    int getNext() {
        return data[position++ % data.size()];
    }
};

double time_passed(){
static double total;
//if(total < 20){ //stop early
    usleep(60000000 / TIME_SCALE); //sleep for a minute
    total += 1.0 / 60.0; //add a minute
    std::cout << "Time: " << total << std::endl;
    return 1.0; //change to 1.0 / 60.0 for real time speed
//} else return 0;
}

int main()
{
    //keep a list of the queues
    std::vector<queue<int> > queues{
    {{1, 2, 3}, 2},
    {{1, 2, 3, 4}, 3}};

    double total_data_rate = 0;
    for(auto q : queues) total_data_rate += q.data_rate;

    double data_to_process = 0; //how many refreshes we have to do
    int queue_number = 0; //which queue we are processing

    auto current_queue = &queues[0];

    while(1) {
        data_to_process += time_passed() * total_data_rate;
        //data_to_process = ceil(data_to_process) //optional

        while(data_to_process >= 1){
            //data_to_process >= 0 will make the the scheduler more
            //eager in the first time period (ie. everything will updated correctly
            //in the first period and and following periods
            if(current_queue->credit >= 1){
            //don't change here though, since credit determines the weighting only,
            //not how many refreshes are made
                //refresh(current_queue.getNext();
                std::cout << "From queue " << queue_number << " refreshed " <<
                current_queue->getNext() << std::endl;
                current_queue->credit -= 1;
                data_to_process -= 1;
            } else {
                queue_number = (queue_number + 1) % queues.size();
                current_queue = &queues[queue_number];
                current_queue->credit += current_queue->data_rate;
            }
        }
    }
   return 0;
}

The example should now compile on gcc with --std=c++11 and give you what you want.

and here is test case output: (for non-time scaled earlier code)

Time: 0
From queue 1 refreshed 1
From queue 0 refreshed 1
From queue 1 refreshed 2
Time: 1
From queue 0 refreshed 2
From queue 0 refreshed 3
From queue 1 refreshed 3
Time: 2
From queue 0 refreshed 1
From queue 1 refreshed 4
From queue 1 refreshed 1
Time: 3
From queue 0 refreshed 2
From queue 0 refreshed 3
From queue 1 refreshed 2
Time: 4
From queue 0 refreshed 1
From queue 1 refreshed 3
From queue 0 refreshed 2
Time: 5
From queue 0 refreshed 3
From queue 1 refreshed 4
From queue 1 refreshed 1

As an extension, to answer the repeating pattern problem by allowing this scheduler to complete only the first lcm(update_rate * lcm(...refresh rates...), ceil(update_rate)) steps, and then repeating the pattern.

ALSO: this will, indeed, be unsolvable sometimes because of the requirement on hour boundaries. When I use your unsolvable example, and modify time_passed to return 0.1, the schedule is solved with updates every 1.1 hours (just not at the hour boundaries!).

like image 167
user3125280 Avatar answered Oct 22 '22 10:10

user3125280