Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL efficient schedule generation algorithm

The idea

Imagine education center which has branches. Courses of this education center are common for all branches.

Branches

CREATE TABLE `Branch` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8;


CREATE TABLE `Course` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `active` tinyint(1) DEFAULT '1',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8;

Rooms in every branch for each course generated by administrators. For example, administrator enters count of rooms for Math course. System generates 3 rooms. In other words they are limited by count.

CREATE TABLE `Room` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `branch_id` int(10) unsigned DEFAULT NULL,
  `course_id` int(10) unsigned DEFAULT NULL,
  `occupied_hours` tinyint(1) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8;

Every room has 5 available teaching hours per day. In other words, Math-1 will have 1 different student group in each teaching hour (of 5).

Students -also grouped by branches. Every student has preffered weekly plan (week_day_mode) to come to secondary school.

  • for 1, 3, 5th days of the week
  • for 2, 4, 6th days of the week

class field is grade in school (main school),

CREATE TABLE `Student` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `fullname` varchar(255) NOT NULL,
  `class` tinyint(2) DEFAULT NULL,
  `branchID` int(10) unsigned DEFAULT NULL,
  `week_day_mode` tinyint(1) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `branchID` (`branchID`)
) ENGINE=InnoDB AUTO_INCREMENT=246 DEFAULT CHARSET=utf8;

When administrator registers student for the first time, he selects all the courses that student wants to participate. For example, if 5 courses selected StudentCourseAssoc will be filled with 5 rows for this student. After testing student for basic knowledge level for each course, administrator evaluates student as "clever" (+1) or "dumb" (-1) on particular course. So knowledge_level is the value for student-course connection.

CREATE TABLE `StudentCourseAssoc` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `studentID` int(10) unsigned DEFAULT NULL,
  `courseID` int(10) unsigned DEFAULT NULL,
  `knowledge_level` tinyint(1) DEFAULT NULL,
  `group_id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1144 DEFAULT CHARSET=utf8;

The application must:

Automatically group (it can create new group or add student to existing group) students of each branch with following conditions

  • Clever and dumb students must be in separate groups
  • Group may consist of some grade mixes. So, it's okay to mix 9th grade with 10th. And 11th with graduated (12th class means graduated in sql). But not 10th-11th. (There will be 2 mode: 9-10, 11-12)
  • Group may consist of 8 students max.
  • Course rooms are limited. So every room might host only 5 groups during day
  • Every student must seat on every chosen (by himself) course in 1 day

After searching for group that meets conditions above, if not found, app must create and then assign the student to the group. Then :

CREATE TABLE `StudentGroupAssoc` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `group_id` int(10) unsigned DEFAULT NULL,
  `student_id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8;

CREATE TABLE `Schedule` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `group_id` int(10) unsigned DEFAULT NULL,
  `week_day_mode` tinyint(1) DEFAULT NULL,
  `hour` tinyint(1) DEFAULT NULL,
  `room_id` int(4) unsigned DEFAULT NULL,
  `teacher_id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `Unique Room for exact time` (`week_day_mode`,`hour`,`room_id`) USING BTREE,
  UNIQUE KEY `Unique Group for exact time` (`group_id`,`week_day_mode`) USING BTREE,
  KEY `Unique Teacher for exact time` (`week_day_mode`,`hour`,`teacher_id`),
  KEY `room_id` (`room_id`),
  KEY `teacher_id` (`teacher_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

And here is fiddle to play with.

What I've done

I'm trying to place student to the group (either existing or to create new one) during knowledge evaluation. Like, if student choose Math as one of courses, when administrator evaluates his knowledge of Math and marks positive the procedure starts to pick right group for this student:

  • Function marks students knwowledge level
  • Checks available hours of student (say, 1th hour is already taken, then he has 4 available hours)
  • Adds class coverage condition to search (like 9-10th grades or 11-12 grades)
  • Checks schedule table if has any group in available hours in student's weekly plans

If there is no one, then attempts to create.

So PHP representation looks like this

        //sets knowledge level of student
        $studentCourse->knowledge_level = intval($_POST["mark"]);

        //check hours of student, and keep only available hours
        $availableHours = array_combine(range(1, 5), range(1, 5));

        //Unsets students unavailable hours from possible hours
        if ($student->GroupRels)
            foreach ($student->GroupRels as $groupRel)
                unset($availableHours[$groupRel->hour]);

        //Checks available groups based on class coverage
        if (in_array($student->class, ['11', 'G']))
            $classCoverage = "11-m";
        else if (in_array($student->class, ['9', '10']))
            $classCoverage = "9-10";

        $availableGroups = Group::find()
            ->with("schedule")
            ->where([
                    "Group.class_coverage" => $classCoverage,
                    "Group.knowledge_level" => $studentCourse->knowledge_level,
                    "Group.participiant_count<8",
                    "Schedule.hour" => $availableHours,
                    'Schedule.week_day_mode' => $student->week_day_mode
                ]
            )->all();


        if (count($availableGroups) > 0) {
             //Selecting one of groups
             //adding row to StudentGroupAssoc
            //adding row to Schedule
        } else {
            $group = new Group();
            $group->branch_id = $student->branchID;
            $group->class_coverage = $classCoverage;
            $group->course_id=$studentCourse->courseID;
            $group->knowledge_level=$studentCourse->knowledge_level;
            $group->save();
            ...
            //adding row to StudentGroupAssoc
            //adding row to Schedule


        }

The question is

Theoretically, the way that I'm doing is like buying ticket for airplane. Is errorless, and must work but it's not efficient and optimal. All of conditions of grouping must be met in a most efficient way: minimal group count and meeting limited room count policy. This methid soon will make plenty of groups which will not fit to available hours of rooms.

As I take hours of student one by one, (during evaluation process) it gets harder and harder to get really efficient results. Chance of not finding groups and not being to able to create new groups because of room limits is going up and up by taking hours of student.

What do you suggest to use to make use of every hour in every room?

UPDATE

Based on @norbert_van_nobelen 's answer I created 'dummy' hours table and following view to get all possible hour-room-course combinations list for each student.

hoursthe real to be planned hour hours_available is the binary switch. So in the real code we add a where clause: WHERE hours_available=0 to get only the hours we want to plan against:

SELECT
    `s`.`id` AS `student_id`,

IF ((ifnull(`sch`.`hour`, 0) > 0), 1, 0) AS `hour_available`,
 `d`.`hours` AS `hours`,
 `sca`.`courseID` AS `courseID`,
 `sch`.`room_id` AS `room_id`,
 `sca`.`knowledge_level` AS `knowledge_level`,
 (
    CASE
    WHEN (
        (`s`.`class` = 9)
        OR (`s`.`class` = 10)
    ) THEN
        '9-10'
    WHEN (
        (`s`.`class` = 11)
        OR (`s`.`class` = 12)
    ) THEN
        '11-12'
    ELSE
        '??'
    END
) AS `class_variant`
FROM
    (
        (
            (
                (
                    `dummy_hours` `d`
                    JOIN `Student` `s`
                )
                LEFT JOIN `StudentCourseAssoc` `sca` ON ((`s`.`id` = `sca`.`studentID`))
            )
            LEFT JOIN `StudentGroupAssoc` `b` ON ((`s`.`id` = `b`.`student_id`))
        )
        LEFT JOIN `Schedule` `sch` ON (
            (
                (
                    `sch`.`group_id` = `b`.`group_id`
                )
                AND (`d`.`hours` = `sch`.`hour`)
            )
        )
    )

Using this view gives full scene of current situation. But I still can't figure out the algorithm to

  • place students in groups
  • to place groups in rooms

in a most efficient, optimal way with minimal group count creation.

Any suggestions?

like image 866
demonoid Avatar asked Sep 07 '15 23:09

demonoid


3 Answers

This answer is just meant as a solution direction for the schedule part, not a 100% nice solution:

What you created, requires loops to be able to satisfy all the conditions.

To get such a case solved quicker, it can be practical to work in vectors instead in which in the vector all positions are represented by 0 (available) and 1 (taken).

So the student/math-1 issue:

Say there are 2 rooms and 3 hours: The math-1 vector per room is then:

Room 1: [0 0 0]
Room 2: [0 0 0]

Essentially (I at least) do not care about if a certain room is available as long as 1 is available: So an AND per index could be the answer in this case for availability (remember: 0 is available):

Room 1: [1 0 0] Room 2: [0 0 0] Room result: [1 0 0] AND [0 0 0]=[0 0 0]

So an AND can tell if the first hour is still available.

If you now combine this with a student with the available hours (also just 3 for this example):

Student A: [0 0 1] Room result: [0 0 0] Student matches with room using an OR for this operation: [0 0 1] OR [0 0 0]=[0 0 1]

So the student A would match into room result.

In SQL: The data model (part: Missing is the course match): Table room:

CREATE TABLE room(
room_id INT,
space TINYINT DEFAULT 0,
hour INT DEFAULT 1
);

CREATE TABLE student(
student_id INT,
space TINYINT DEFAULT 0,
hour INT DEFAULT 1
)

All data has been inserted into tables in full: In this case 1 room, 3 hours, 3 places available.

INSERT INTO room VALUES (1,0,1);
INSERT INTO room VALUES (1,0,1);
INSERT INTO room VALUES (1,0,1);
INSERT INTO room VALUES (1,0,2);
INSERT INTO room VALUES (1,0,2);
INSERT INTO room VALUES (1,0,2);
INSERT INTO room VALUES (1,0,3);
INSERT INTO room VALUES (1,0,3);
INSERT INTO room VALUES (1,0,3);

Student has:

INSERT INTO student VALUES(1,0,1);   
INSERT INTO student VALUES(1,0,2);   
INSERT INTO student VALUES(1,1,3);   

So the student is available in the first two hours only.

To now get a result from a query:

SELECT room_id
FROM room a
INNER JOIN student b ON a.space=b.space AND a.hour=b.hour;

This result only has to be split into the groups of maximum of 8, in which it is the end of the SQL part and time for another programming language.

This model can be expanded with a date, however it works best when using just hours and weekdays (weekday availability is again 0 or 1).

As I stated: this is a concept/idea, not a 100% solution, so it needs work before you can use it.....

like image 106
Norbert van Nobelen Avatar answered Oct 01 '22 06:10

Norbert van Nobelen


I believe what you are describing is a version of the constraint satisfaction problem, which is frequently used solve resource allocation issues. It is very likely that the solution is going to be NP-complete, or in other words, the time needed to solve the problem will grow exponentially as the size of the problem (in this case the number of students/classes/rooms) grows. This is one of the classic outstanding problems in computer science. There is no known perfect solution, but that doesn't mean there isn't something that will be useful in your situation. I'll try to describe your problem in a bit more detailed fashion before suggesting a path toward a solution.

Two problems

You have at least two problems that you are trying to solve:

  1. Is it possible to find any combination of students-groups-classes that will fit into the available times-rooms?
  2. From the possible combinations, is one of them more optimal than another? And is it possible to determine which combination is optimal in a reasonable amount of time?

First, it is very likely that there are no possible combinations that would satisfy your constraints. To demonstrate this, imagine that you have only two students and only one classroom that is available for only one hour. If the two students can be put into the same group, then it is possible to schedule them both into the one classroom at the same time. However, if the two students cannot be grouped, e.g. one is "dumb" and one is "clever," then there is no combination of resources that will satisfy your constraints.

While it is easy to determine if a solution exists in a very simple case like I have described, it is very difficult to determine if a solution even exists for an arbitrarily large set of students/classes/rooms.

Setting Reasonable Limits

First of all, it is easy to set an absolute upper bound on the number of students that can be enrolled. The theoretical max enrollment equals

rooms * hours * students/room / hours/student

So for example if you have 100 rooms available for 5 hours each, each room can hold 8 students, and each student needs to study for 5 hours:

100 * 5 * 8 / 5 = 800 students

However, it is extremely unlikely, given a random collection of students of varying grades and ability levels, that you'll ever be able to accommodate this theoretical maximum.

If we come from the other end of the spectrum, assume you have 500 classroom hours (100 rooms * 5 hours), then you know that you can always accommodate at least 100 students (1 student per room * 5 hours). The trick is to figure out a reasonable upper bound, between 100 and 800, that makes this problem solvable in a reasonable amount of time.

In order to make a reasonable guess at what this upper bound should be, it seems prudent to look at the constraints on group formation.

Grouping Constraints

Students are classified in two dimensions:

  1. Ability Level: Dumb, Normal, Clever (D, N, C)
  2. Grade level: 9, 10, 11, 12

Which means you have 12 categories of student: 9D, 9N, 9C, 10D, 10N, 10C, ...

Only some of those categories are compatible with each other for grouping, which gives you a finite number of potential group types. Assuming you only had 12 students, 1 of each of the 12 types, the theoretical maximum number of group types (assuming ANY student type can be paired with any other type), would be 12!/4! = 19,958,400. But given the constraints the actual possible number of group types will be smaller. As it turns out, I think we can safely reduce the group types to four, each of which is made up of some combination of students of types:

  1. 9D, 9N, 10D, 10N
  2. 9N, 9C, 10N, 10C
  3. 11D, 11N, 12D, 12N
  4. 11N, 11C, 12N, 12C

There is some obvious overlap here since "normal" students can belong to more than one group type. But we're finally beginning to get some information that would be useful for group formation:

Start by assigning students in the most restrictive categories to groups first. Then add students in less restrictive groups.

In other words, students in the "dumb" and "clever" categories can only belong to one of the four group types, so they should be assigned first. So the algorithm might look like:

  1. For each course
  2. Select all of the clever/dumb students in grades 9/10 or 11/12
  3. Create as many groups of 8 as you can with students in that category
  4. Fill up the remaining groups with empty slots with "normal" students
  5. Group the remaining "normal" students into groups of 8

This should result in a grouping with the minimum number of groups possible. The problem with this is that it is only one of thousands (maybe millions) of other groupings that is possible. It is highly unlikely that this particular grouping will be the right one. We still can swap students in different groups, but we need a smart way to do this.

Scheduling Constraints

Now that you've got students assigned to groups, you can begin putting the groups in classroom/time slots. The primary constraint here is that you can't schedule two groups into a time slot which would require the student to be in more than one place at the same time.

Let's again start with a simpler example that we can actually envision in our heads. Let's assume there are only four courses, Art, Music, Math and Science, that will be taught in 2 time slots in 4 classrooms. We will have 8 groups of 2 students, noting that each student will be in 2 of the groups since each student is taking two of the available classes. For simplicity, let's assume all of students are in the same category, e.g. 9N, so can be swapped between groups without problem. The students represented by the letters A-H and a group is represented by two letters, e.g. group AB contains students A and B. Let's say the first schedule generated by the system looks like this:

         Art  Music  Math  Science
Time_1    AB   CD     EF    AH
Time_2    CD   EF     GH    GB

Each course is taught twice, and we see that all of the groups are made up of a valid set of students, but we see that students A and G are both double booked: A has two classes at Time_1 and G has two classes at Time_2. The simple thing to do is to swap A and G in their science times:

         Art  Music  Math  Science
Time_1    AB   CD     EF    GH
Time_2    CD   EF     GH    AB

But there are also more complex solutions that involve moving a lot of people around and changing all of the groups, e.g.:

         Art  Music  Math  Science
Time_1    AC   ED     GF   BH
Time_2    BD   FC     HE   AG

Clearly, one of these is more efficient than the other, but there is no easy way for a computer to tell the difference. As humans we can see the solution relatively quickly in this example, but imagine dozens of courses with 8 students in each and you see how quickly this becomes a mess. Obviously we don't want to have to check all of the possible permutations in a brute force to find a solution.

Of course another solution would be just to add more time slots, e.g.:

         Art  Music  Math  Science
Time_1    AB   CD     EF    GH
Time_2    CD   EF     H     B
Time_3                G     A

Computationally, this is simpler and faster, but obviously doesn't optimize classroom space and teacher time, and of course it wouldn't be possible if all of the possible time slots already had classes in them.

Being Intelligent About It

Let's step back for a second and think about what we know about the whole system. Here's some things we know:

  1. Similar students are likely to choose similar sets of classes
  2. If you have a set of groups of students who all are taking the same set of classes it is straightforward to schedule them

For example, if we have 4 students (2 groups of 2) who all want to take the same set of classes, it is easy to just fit the groups into a matrix:

         Class_1 Class_2
Time_1     AB      CD
Time_2     CD      AB

Doing it this way we can be confident in advance that there will be no conflicts. This is straightforward, scales well, and brings us to our second insight.

Start by creating groups of students who are all taking the same set of classes.

Taking this into account, we might change our algorithm above to the following:

  1. For each student in a restrictive category (i.e. dumb/clever)
  2. Loop through all of the other students in that category and create groups from all of the ones who have chosen the same set of classes
  3. If there is a group left with <8 students, fill that one with students in a non-restrictive group (normal) that have chosen that same set of classes
  4. As students are added to groups, remove them from the total pool
  5. Repeat this for all of the students in restrictive categories
  6. Schedule all of these students in a grid matrix fashion
  7. Repeat this for remaining students in the normal category

With any luck, by the time this is done you will have a much smaller set of students who have a more challenging set of scheduling requirements.

What Next?

From here the smartest thing to do would seem to depend on how many students are left in the unscheduled pool. Possibilities include:

  • Repeat the above strategy but grouping students with 4 out 5 classes in common
  • Create a list of courses that need to be created based on the requested courses in the remaining pool, then loop through each of those courses and add as many students as you can to each sequentially starting with students in more restrictive categories and filling in with the normal students
  • If the number is small enough, just manually create any remaining courses

At some point I think you'll find that it's easier just to assign schedules manually to any "oddballs."

You might consider figuring out a way to give a slightly higher weight to grouping students who are in the same grade.

Code examples

Here are some snippets of code that might help. Note, in re-reading your problem I just realized that knowledge_level is assigned on a per-course basis and not to the student overall. I'll try to make adjustments to account for that.

// function to determine whether two students have selected the same classes
function studentsSelectedSameClasses($s1, $s2) {
    // returns true if students selected the same set of classes
    // returns false other wise
    // this takes into account knowledge_level and will consider
    // a class the same if the knowledge_levels are compatible
}

// create arrays of unscheduled students, i.e. not yet in groups, by grade
// 9th/10th and 11th/12th together since they can be in the same classes
$unscheduled["9_10"] = Student::find()->whereIn('class', [9,10])->all();
$unscheduled["11_G"] = Student::find()->whereIn('class', [11,G])->all();

// copy this array into another array from which we'll remove
// students as they get put into groups
$pool = $unscheduled;

// loop through unscheduled; try to match on course selections
foreach($unscheduled as $grade => $students) {
    foreach($students as $i => $student) {
        // make sure they are still in the pool, i.e. not already in a group
        if(!in_array($student, $pool[$grade]) continue;

        // now loop through other students
        foreach($pool[$grade] as $j => $potential_classmate) {
            if(studentsSelectedSameClasses($student,$potential_classmate)){
                // create new groups for each class if necessary
                // add both students to the groups if necessary
                // remove them from the $pool
                // if the group size reaches 8 go on to the next unscheduled
            }
        }
    }
}

// At this point $pool may not be empty, but you should have a bunch of 
// easily scheduled groups and a much smaller pool to resolve

Thanks for a fun problem. I enjoyed thinking about it and hope this helps!

like image 24
morphatic Avatar answered Oct 01 '22 05:10

morphatic


Interesting problem and for me I would have a suggestion on an approach al be it my brain does not construct logical problems in math ways but I have been seen to show intelligence beyond hairdressing so here I go.

I can follow the proposed lack of constriction's and it made me think of linear problems / programming that also needs precise constraints to calculate an optimum. But also that we can cut back a matrix calculation in half by first dividing it by 2 and then search a result in the lower or upper half as it can't be in both. But there is nothing to cut in half so I then thought that its more the reasonable to assume that there has to be a real life making sense sort of thing here or it won't work or adds up to quickly is my term for it:D

So I now suggest that there is logic in this approach: courses are linear going from intro course to exam. So no student that has taken the intro course may take it again as it's just stupid (Einstein et al:). So every student that has attended math1 can be excluded for that course. So if we make a gradual approach with math 1 to 5 and the rule that all courses must be attended where a course level may not differ more than -2 from the student level that equals courses attended in a linear fashion for a given student, then all students that have attended course 1 can be excluded for that class. So for math 2 only students with levels, or courses attended, 0 and 1 can attend. For math 3 students with level 1 or 2 can attend. So if we start creating the largest groups of students that can attend any course and start of by making a cut on smart and stupid straight away to save time paring pointless options as a level 4 student can never attend the same math class as a level 0,1 student?

Or something like that. Am working on that graph for this but it looks more like my neighbors' garage at this moment so dont expected much I guess..

like image 1
Daniel Mulder Avatar answered Oct 01 '22 04:10

Daniel Mulder