Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Representing "X of Y" in SQL

In my database, I have a lot of courses that are compulsory. Some are elective. However, there are courses of a third kind: A list from which you have to choose X courses. The list (and the number X) is different for each study program. How would you represent this relationally?

like image 891
Christian Neverdal Avatar asked Jul 05 '10 13:07

Christian Neverdal


4 Answers

I find it interesting that the accepted answer says, "There is no way to represent the 'X of Y' relationally" when that is essentially what the question was asking for. It seems to me that 'X of Y' can indeed be modelled (and largely enforced) using SQL and here's a suggested way:

Example scenario: students taking the course 'French' must choose two components (x) out of a total of three possible compnents (y).

CREATE TABLE Components
(
 component_name VARCHAR(100) NOT NULL, 
 UNIQUE (component_name)
);

INSERT INTO Components (component_name) VALUES 
('Oral'), 
('Writing'), 
('Vocab'), 
('Databases');

Clearly, 'Databases' doesn't belong on a course on French so we need a tables for course designers to model courses [these tables have many relevant candidate keys so for clarity I'll define them at the 'bottom' of the CREATE TABLE statement):

CREATE TABLE XofYCourses
(
 course_name VARCHAR(100) NOT NULL, 
 x_components_choice_tally INTEGER NOT NULL 
    CHECK (x_components_choice_tally > 0), 
 y_components_tally INTEGER NOT NULL
    CHECK (y_components_tally > 0), 
 CHECK (x_components_choice_tally < y_components_tally),
 UNIQUE (course_name), 
 UNIQUE (course_name, y_components_tally), 
 UNIQUE (course_name, x_components_choice_tally)
);


INSERT INTO XofYCourses (course_name, y_components_tally, 
x_components_choice_tally) VALUES 
('French', 2, 3);

The above allows us to model the 'two out of three' attribute of the French course. Now we need a table to model what the three possible components of that course actually are:

CREATE TABLE XofYCourseComponents
(
 course_name VARCHAR(100) NOT NULL, 
 y_components_tally INTEGER NOT NULL, 
 FOREIGN KEY (course_name, y_components_tally)
    REFERENCES XofYCourses (course_name, y_components_tally), 
 component_sequence INTEGER NOT NULL
    CHECK (component_sequence > 0), 
 component_name VARCHAR(100) NOT NULL 
    REFERENCES Components (component_name), 
 CHECK (component_sequence <= y_components_tally), 
 UNIQUE (course_name, component_sequence), 
 UNIQUE (course_name, component_name) 
);

INSERT INTO XofYCourseComponents (course_name, 
component_sequence, y_components_tally, component_name) 
VALUES 
('French', 1, 3, 'Oral'), 
('French', 2, 3, 'Writing'), 
('French', 3, 3, 'Vocab');

Now for enrolment. Billy want to do the French course...

CREATE TABLE Students
(
 student_name VARCHAR(20) NOT NULL, 
 UNIQUE (student_name)
);

INSERT INTO Students (student_name) VALUES ('Billy');

...and chooses 'Oral' and 'Vocab':

CREATE TABLE XofYCourseComponentChoices
(
 student_name VARCHAR(20) NOT NULL
    REFERENCES Students (student_name), 
 course_name VARCHAR(100) NOT NULL, 
 x_components_choice_tally INTEGER NOT NULL, 
 FOREIGN KEY (course_name, x_components_choice_tally)
    REFERENCES XofYCourses (course_name, x_components_choice_tally), 
 component_name VARCHAR(100) NOT NULL, 
 FOREIGN KEY (course_name, component_name)
    REFERENCES XofYCourseComponents (course_name, component_name), 
 x_component_sequence INTEGER NOT NULL
    CHECK (x_component_sequence > 0), 
 CHECK (x_component_sequence <= x_components_choice_tally), 
 UNIQUE (student_name, course_name, component_name), 
 UNIQUE (student_name, course_name, x_component_sequence)
);

INSERT INTO XofYCourseComponentChoices (student_name, course_name, 
component_name, x_component_sequence, x_components_choice_tally)
VALUES
('Billy', 'French', 'Oral', 1, 2), 
('Billy', 'French', 'Vocab', 2, 2);

The above structure works a good way of enforcing the maximum values i.e. no more than three components for the French course and no more than two choices for each student.

What it doesn't do, however, is to ensure exact amounts e.g. that Billy doesn't choose just one component. Standard SQL has solutions to this problem e.g. CHECK constraints that support subqueries (e.g. to count that there are a total of two rows for Billy...) and DEFERRABLE constraints (...but delay the count until the point when the transaction is being committed). Having a 'multiple assignment' feature would be even better. However, most SQL products don't have these features.

Does this lack of support for the full solution mean we don't do anything and just trust the application will refrain from writing invalid data? of course not!

A good interim approach is to revoke the privileges from the base tables and provide helper stored procedures e.g. one to enrol a student which takes their chosen course components as parameters: the count is done at after the INSERTs and if it violates the data rules (e.g. less than two for French) then the transaction is rolled back and an error returned.

like image 160
onedaywhen Avatar answered Sep 21 '22 01:09

onedaywhen


You need 3 tables here: StudyPrograms, Courses and Components. Components represents the Courses which comprise each StudyProgram and is a junction table between Courses and StudyPrograms.

Each Component record can contain a field indicating if the Course is a compulsory part of the StudyProgram. You can also include a field to indicate whether the Course is one of a list that can be chosen.

There is no way to represent the 'X of Y' relationally, you will need some logic in your stored procedures to ensure this business rule is followed (or possibly in your data access code layer, depending on how you want to organise the application).

like image 30
David Avatar answered Sep 20 '22 01:09

David


You have two options: you can model the data closer to reality, where some are single-course requirements, and others are X from Y courses requirements, or you could model all requirements as X from Y, where the single-course requirements are "1 from 1" requirements.

I would recommend something like this:

Course
---------------
CourseID
Description
...

Program
---------------
ProgramID
Description
...

CourseGroup
---------------
CourseGroupID
CourseID

ProgramCourseGroup
---------------
ProgramID
CourseGroupID
RequiredCourses

Course and Program are the two top-level tables. They define the simple list of all courses and programs, respectively, without any sort of relation between the two.

CourseGroup defines a group of courses. This relates to Course, but no other tables.

ProgramCourseGroup relates course groups to programs. A program indicates that a particular group of courses is required, then RequiredCourses indicates how many courses from that group must be taken in order to satisfy the requirement.

For instance, say you have a program called "Basket Weaving" that requires:

  • Intro to baskets
  • Basic weaving techniques

and two of the following four courses:

  • Easter baskets
  • Handbaskets
  • Picnic baskets
  • SCUBA Diving

Your data would look like:

Course
------------------------------------
CourseID    Description
1           Intro to baskets
2           Basic weaving techniques
3           Easter baskets
4           Handbaskets
5           Picnic baskets
6           SCUBA Diving

Program
--------------------------
ProgramID   Description
1           Basket Weaving

CourseGroup
--------------------------
CourseGroupID  CourseID
1              1
2              2
3              3
3              4
3              5
3              6

ProgramCourseGroup
-----------------------------------------
ProgramID  CourseGroupID  RequiredCourses
1          1              1
1          2              1
1          3              2
like image 37
Adam Robinson Avatar answered Sep 21 '22 01:09

Adam Robinson


In principle one would expect to be able to create a constraint something like this to enforce the rule:

CHECK
 (NOT EXISTS
  (SELECT 1
   FROM CourseEnrolement c, ProgramEnrolement p
   WHERE c.StudentId = p.StudentId
   AND c.ProgramId = p.ProgramId
   GROUP BY p.StudentId, p.ProgramId, p.NumberOfCoursesRequired
   HAVING COUNT(*) <> p.NumberOfCoursesRequired
 ))

Unfortunately, SQL makes it all but impossible to implement this, or at least makes it very difficult to update the database while the constraint is enforced. So if you really want to represent such a rule in the database then you may need a better model than SQL can offer. In practice such rules will often be enforced in application code instead.

like image 20
nvogel Avatar answered Sep 20 '22 01:09

nvogel