Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a weighted sum of values from different tables

Tags:

mysql

I'm trying to create a list of students whose behaviour is statistically worst across each of our school's year groups.

We have a table named students.

We then have behavioural flags and alerts, plus sanctions.

However, different categories of flag/alert/sanction are deemed more serious than others. These are stored with labels in their respective _categories table, e.g. flag_categories and sanction_categories. The flag table will then have a column called Category_ID (alerts is a bit different as it's just a Type field with 'A', 'C', 'P' and 'S' values).

If I want to look at data which shows our highest-flagged students in a specific year group, I'd run this query:

SELECT
  CONCAT(stu.Firstname, " ", stu.Surname) AS `Student`,
  COUNT(f.ID) AS `Flags`
FROM `students` stu
LEFT JOIN `flags` f ON f.Student_ID = stu.id
WHERE stu.Year_Group = 9
GROUP BY stu.id
ORDER BY `Flags` DESC
LIMIT 0, 20

If I wanted to show our students with the most Crisis alerts in a specific year group, I'd run this query:

SELECT
  CONCAT(stu.Firstname, " ", stu.Surname) AS `Student`,
  COUNT(f.ID) AS `Flags`
FROM `students` stu
LEFT JOIN `flags` f ON f.Student_ID = stu.id
WHERE stu.Year_Group = 9
AND f.Category_ID = 10
GROUP BY stu.id
ORDER BY `Flags` DESC
LIMIT 0, 20

If I want to find how many Late or Mobile flags a student has, and perhaps add these together (with weightings), I can run the following query:

SELECT
  CONCAT(stu.Firstname, " ", stu.Surname) AS `Student`,
  SUM(CASE WHEN f.Category_ID = 10 THEN 1 ELSE 0 END) AS `Late Flags`,
  SUM(CASE WHEN f.Category_ID = 12 THEN 2 ELSE 0 END) AS `Mobile Flags`,
  ## not sure about this line below... is there a nicer way of doing it? `Late Flags` isn't recognised as a field apparently
  ## so I can't just do ( `Late Flags` + `Mobile Flags` )
  (SUM(CASE WHEN f.Category_ID = 10 THEN 1 ELSE 0 END) + SUM(CASE WHEN f.Category_ID = 12 THEN 2 ELSE 0 END)) AS `Points`
FROM `flags` f
LEFT JOIN `students` stu ON f.Student_ID = stu.id
WHERE stu.Year_Group = 9
GROUP BY stu.id
ORDER BY `Points` DESC
LIMIT 0, 20

What I don't understand is how I would do this across myriad tables. I need to be able to weight:

  • Late (flags, Category_ID = 10), Absconded (flags, Category_ID = 15) and Community flags (flags, Category_ID = 13) plus Safeguarding alerts (alerts, Type = 'S') are all worth 1 point
  • Behavioural flags (flags, Category_ID IN (1, 7, 8)) are worth 2 points
  • Process alerts (alerts, Type = 'P') and detention sanctions (sanctions, Category_ID = 1) are worth 3 points

So on and so forth. That's far from an exhaustive list but I've included enough variables to help me get my head round a multi-table weighted sum.

The outcome I'm looking for is just 2 columns - student's name and weighted points.

So, according to the bullet points above, if a student has received 2 Late flags (1 point each) and 1 Process alert (3 points), the output should just say Joe Bloggs and 5.

Can anyone help me to understand how I can get these weighted values from different tables into one SUM'd output for each student?

[edit] SQLFiddle here: http://sqlfiddle.com/#!9/449218/1/0

like image 367
turbonerd Avatar asked Aug 03 '16 10:08

turbonerd


People also ask

What is a weighted sum of values?

Weighted Sum works by multiplying the designated field values for each input raster by the specified weight. It then sums (adds) all input rasters together to create an output raster.

How do I do weighting in Excel?

To calculate the weighted average in Excel, you must use the SUMPRODUCT and SUM functions using the following formula: =SUMPRODUCT(X:X,X:X)/SUM(X:X) This formula works by multiplying each value by its weight and combining the values. Then, you divide the SUMPRODUCT but the sum of the weights for your weighted average.

Is SUMPRODUCT the same as weighted average?

SUMPRODUCT is essentially the Sum of Test 1 times its weight, plus the Mid-term times its weight, and so on. To get the Weighted Average, you divide by the Total of the weights. If we had just averaged the Test scores, the value would be 75.5, a significant difference.


1 Answers

Note, I am not doing this for the bounty. Please give to someone else.

This could be done with a few LEFT JOINs of derived tables. Note you did not supply the sanctions table. But the below would appear to be well illustrative. So I created a temp table. It would seem to allow for maximum flexibility without overcomplicating a larger left join notion that might be hard to debug. Afterall, you said your real querying will be much more complicated than this. As such, build out the temp table structure more.

This loads a tmp table up with default 0's for the students in the "passed by parameter Student Year" to a stored procedure. Two updates are performed. Then selects for a result set.

Schema / Load:

create schema s38741386; -- create a test database
use s38741386;

CREATE TABLE `students` (
  `id` int(11) PRIMARY KEY,
  `Firstname` varchar(50) NOT NULL,
  `Surname` varchar(50) NOT NULL,
  `Year_Group` int(2) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

# STUDENT INSERTS
INSERT INTO `students`
  (`id`, `Firstname`, `Surname`, `Year_Group`)
VALUES
  (201, 'Student', 'A', 9),
  (202, 'Student', 'B', 9),
  (203, 'Student', 'C', 9),
  (204, 'Student', 'D', 9),
  (205, 'Student', 'E', 9);

CREATE TABLE `alert` (
  `ID` int(11) PRIMARY KEY,
  `Staff_ID` int(6) NOT NULL,
  `Datetime_Raised` datetime NOT NULL,
  `Room_Label` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
  `Type` enum('A','C','P','Q','S') COLLATE utf8_unicode_ci NOT NULL COMMENT 'A=Absconded, C=Crisis, P=Process, Q=Quiet, S=Safeguarding',
  `Details` text COLLATE utf8_unicode_ci,
  `Responder` int(8) DEFAULT NULL,
  `Datetime_Responded` datetime DEFAULT NULL,
  `Room_ID` int(11) NOT NULL COMMENT 'will be linked to internal room id.',
  `Status` varchar(1) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT 'O:ngoing, R:esolved'
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

# ALERT INSERTS
INSERT INTO `alert`
  (`ID`, `Staff_ID`, `Datetime_Raised`, `Room_Label`, `Type`, `Details`, `Responder`, `Datetime_Responded`, `Room_ID`, `Status`)
VALUES
  (1, '101', '2016-08-04 00:00:00', NULL, 'P', NULL, '103', '2016-08-04 00:00:01', '15', 'R'),
  (2, '102', '2016-08-04 00:00:00', NULL, 'P', NULL, '103', '2016-08-04 00:00:01', '15', 'R'),
  (3, '102', '2016-08-04 00:00:00', NULL, 'P', NULL, '103', '2016-08-04 00:00:01', '15', 'R'),
  (4, '101', '2016-08-04 00:00:00', NULL, 'P', NULL, '103', '2016-08-04 00:00:01', '15', 'R');

CREATE TABLE `alert_students` (
  `ID` int(11) PRIMARY KEY,
  `Alert_ID` int(6) NOT NULL,
  `Student_ID` int(6) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

# ALERT_STUDENT INSERTS
INSERT INTO `alert_students`
  (`ID`, `Alert_ID`, `Student_ID`)
VALUES
  (1, '1', '201'),
  (2, '1', '202'),
  (3, '2', '201'),
  (4, '3', '202'),
  (5, '4', '203'),
  (6, '5', '204');

CREATE TABLE `flags` (
  `ID` int(11) PRIMARY KEY,
  `Staff_ID` int(11) NOT NULL,
  `Student_ID` int(11) NOT NULL,
  `Datetime` datetime NOT NULL,
  `Category_ID` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;


# ALERT INSERTS
-- TRUNCATE TABLE flags;
INSERT INTO `flags`
  (`ID`, `Staff_ID`, `Student_ID`, `Datetime`, `Category_ID`)
VALUES
  (1, '101', '201', '2016-08-04 00:00:01', 10),
  (2, '102', '202', '2016-08-04 00:00:02', 12),
  (3, '102', '203', '2016-08-04 00:00:03', 10),
  (4, '101', '204', '2016-08-04 00:00:04', 13),
  (5, '102', '202', '2016-08-04 00:00:02', 12),
  (6, '102', '203', '2016-08-04 00:00:03', 10),
  (7, '101', '204', '2016-08-04 00:00:04', 13),
  (8, '102', '202', '2016-08-04 00:00:02', 10),
  (9, '102', '203', '2016-08-04 00:00:03', 10),
  (10, '101', '204', '2016-08-04 00:00:04', 7),
  (11, '101', '204', '2016-08-04 00:00:07', 8),
  (12, '101', '204', '2016-08-04 00:00:08', 1),
  (13, '101', '204', '2016-08-04 00:00:09', 8);

Stored Procedure:

DROP PROCEDURE IF EXISTS rptSM_by_year;
DELIMITER $$
CREATE PROCEDURE rptSM_by_year
(   pSY INT -- parameter student year
)
BEGIN


    DROP TEMPORARY TABLE IF EXISTS tmpStudentMetrics;
    CREATE TEMPORARY TABLE tmpStudentMetrics
    (   `StudentId` int(11) PRIMARY KEY,
        LateFP INT NOT NULL,
        MobiFP INT NOT NULL,
        AbscFP INT NOT NULL,
        CommFP INT NOT NULL,
        SafeAP INT NOT NULL,
        BehaFP INT NOT NULL,
        ProcAP INT NOT NULL
    )ENGINE=InnoDB;

    INSERT tmpStudentMetrics (StudentId,LateFP,MobiFP,AbscFP,CommFP,SafeAP,BehaFP,ProcAP)
    SELECT id,0,0,0,0,0,0,0
    FROM students
    WHERE Year_Group = pSY;

    UPDATE tmpStudentMetrics tmp
    JOIN
    (   SELECT
          stu.id,
          SUM(CASE WHEN f.Category_ID = 10 THEN 1 ELSE 0 END) AS `LateFP`,
          SUM(CASE WHEN f.Category_ID = 15 THEN 1 ELSE 0 END) AS `AbscFP`,
          SUM(CASE WHEN f.Category_ID = 13 THEN 1 ELSE 0 END) AS `CommFP`,
          SUM(CASE WHEN f.Category_ID = 12 THEN 2 ELSE 0 END) AS `MobiFP`,
          SUM(CASE WHEN f.Category_ID IN (1,7,8) THEN 2 ELSE 0 END) AS `BehaFP`
        FROM `flags` f
        LEFT JOIN `students` stu ON f.Student_ID = stu.id
        WHERE stu.Year_Group = pSY
        GROUP BY stu.id
    ) xDerived
    ON xDerived.id=tmp.StudentId
    SET tmp.LateFP=xDerived.LateFP,
    tmp.AbscFP=xDerived.AbscFP,
    tmp.CommFP=xDerived.CommFP,
    tmp.MobiFP=xDerived.MobiFP,
    tmp.BehaFP=xDerived.BehaFP;

    UPDATE tmpStudentMetrics tmp
    JOIN
    (   SELECT
          stu.id,
          SUM(CASE WHEN a.Type = 'S' THEN 1 ELSE 0 END) AS `SafeAP`,
          SUM(CASE WHEN a.Type = 'P' THEN 3 ELSE 0 END) AS `ProcAP`
        FROM `alert_students` als
        JOIN `alert` a
        ON a.ID=als.Alert_ID
        JOIN `students` stu 
        ON stu.id=als.Student_ID and stu.Year_Group = pSY
        GROUP BY stu.id
    ) xDerived
    ON xDerived.id=tmp.StudentId
    SET tmp.SafeAP=xDerived.SafeAP,
    tmp.ProcAP=xDerived.ProcAP;

    -- SELECT * FROM tmpStudentMetrics; -- check detail

    SELECT stu.id, 
    CONCAT(stu.Firstname, " ", stu.Surname) AS `Student`, 
    tmp.LateFP+tmp.MobiFP+tmp.AbscFP+tmp.CommFP+tmp.SafeAP+tmp.BehaFP+tmp.ProcAP AS `Points` 
    FROM `students` stu 
    JOIN tmpStudentMetrics tmp 
    ON tmp.StudentId=stu.id 
    WHERE stu.`Year_Group` = pSY 
    ORDER BY stu.id; 

    -- SELECT * FROM tmpStudentMetrics; -- check detail
    DROP TEMPORARY TABLE IF EXISTS tmpStudentMetrics;
    -- TEMP TABLES are connection based. Explicityly dropped above for safety when done.
    -- Depends on your connection type and life-span otherwise.
END$$
DELIMITER ;

Test:

call rptSM_by_year(9);
+-----+-----------+--------+
| id  | Student   | Points |
+-----+-----------+--------+
| 201 | Student A |      7 |
| 202 | Student B |     11 |
| 203 | Student C |      6 |
| 204 | Student D |     10 |
| 205 | Student E |      0 |
+-----+-----------+--------+

Cleanup:

drop schema s38741386; -- drop the test database
like image 89
Drew Avatar answered Sep 30 '22 15:09

Drew