Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoid subquery to select records from same table based on date of base record

I have a StudentScores table as listed below, in SQL Server 2012. The grading system is weighted using special rules. For each MATHS result of the student, there will be one row in the result set. The row may or may not have scores for SCIENCE and LITERATURE columns based on whether there is a score available "within two months of the MATHS result date for SCIENCE" and "within one month of the MATHS result date for LITERATURE".

Note: This is a scenario I created to simplify my actual business domain problem.

I created following query with sub-queries. Is there a way to rewrite it without subqueries and more efficiently?

TABLE

DECLARE @StudentScores TABLE (StudentMarkID INT IDENTITY(1,1) NOT NULL, StudentID INT, SubjectCode VARCHAR(10), ResultDate DATETIME, Score DECIMAL(5,2))
INSERT INTO @StudentScores (StudentID,SubjectCode,ResultDate,Score)
SELECT 1, 'MATHS','2016-01-10',35
UNION ALL
SELECT 1, 'LITERATURE','2016-01-10',62
UNION ALL
SELECT 1, 'SCIENCE','2016-01-30',65
UNION ALL
SELECT 1, 'SCIENCE','2016-02-02',61
UNION ALL
SELECT 1, 'LITERATURE','2016-02-03',60
UNION ALL
SELECT 1, 'MATHS','2016-03-25',55
UNION ALL
SELECT 2, 'LITERATURE','2016-01-10',12
UNION ALL
SELECT 2, 'SCIENCE','2016-01-30',14
UNION ALL
SELECT 2, 'SCIENCE','2016-02-14',12
UNION ALL
SELECT 2, 'LITERATURE','2016-02-14',15
UNION ALL
SELECT 2, 'MATHS','2016-03-25',18

QUERY

SELECT SS.StudentID, Score AS MathsScore, 
ResultDate AS MathsResultDate,
    (SELECT TOP 1 Score 
            FROM @StudentScores S2 
            WHERE S2.StudentID = SS.StudentID 
            AND S2.SubjectCode = 'SCIENCE'
            AND S2.ResultDate >= DATEADD(MONTH,-2,SS.ResultDate)
            ORDER BY s2.ResultDate DESC
    ) AS ScienceScore,
    (SELECT TOP 1 ResultDate 
            FROM @StudentScores S2 
            WHERE S2.StudentID = SS.StudentID 
            AND S2.SubjectCode = 'SCIENCE'
            AND S2.ResultDate >= DATEADD(MONTH,-2,SS.ResultDate)
            ORDER BY s2.ResultDate DESC
    ) AS ScienceResultDate,
    (SELECT TOP 1 Score 
            FROM @StudentScores S2 
            WHERE S2.StudentID = SS.StudentID 
            AND S2.SubjectCode = 'LITERATURE'
            AND S2.ResultDate >= DATEADD(MONTH,-1,SS.ResultDate)
            ORDER BY s2.ResultDate DESC
    ) AS LiteratureScore,
    (SELECT TOP 1 ResultDate 
            FROM @StudentScores S2 
            WHERE S2.StudentID = SS.StudentID 
            AND S2.SubjectCode = 'LITERATURE'
            AND S2.ResultDate >= DATEADD(MONTH,-1,SS.ResultDate)
            ORDER BY s2.ResultDate DESC
    ) AS LiteratureResultDate
FROM @StudentScores SS
WHERE SS.SubjectCode = 'MATHS'

Expected Result

enter image description here

like image 968
LCJ Avatar asked Nov 07 '22 15:11

LCJ


1 Answers

I have managed to reduce the query to two calls to the data table - one for getting the Maths details as their dates are used to extract the details for the other subjects and second for the other subjects:

WITH DataSource_Maths AS
(
    SELECT SS.[StudentID]
          ,SS.[Score] AS [MathsScore]
          ,SS.[ResultDate] AS [MathsResultDate]
          -- we are using this interal ID later in the final join between the two CTEs
          -- in order to know which record, for which date period refers
          ,ROW_NUMBER() OVER(ORDER BY SS.[StudentID], SS.[ResultDate]) AS InternalID
    FROM @StudentScores SS
    WHERE SS.[SubjectCode] = 'MATHS'
),
DataSource_Others AS
(
    SELECT DS.[StudentID]
          ,DS.[SubjectCode]
          ,DS.[Score]
          ,DS.[ResultDate]
          ,Ds.[RowID]
          ,SS.[InternalID]
    FROM DataSource_Maths SS
    OUTER APPLY
    (
        SELECT *
               -- calculating row ID for each record across student and subject (we are going to take only the latest ones)
               -- this is achived using TOP in your example
              ,DENSE_RANK() OVER (PARTITION BY [StudentID], [SubjectCode] ORDER BY [ResultDate] DESC) AS [RowID]
        FROM @StudentScores
        WHERE
        ( 
            [ResultDate] >= DATEADD(MONTH, -2, SS.[MathsResultDate]) AND [SubjectCode] = 'SCIENCE'
            OR
            [ResultDate] >= DATEADD(MONTH, -1, SS.[MathsResultDate]) AND [SubjectCode] = 'LITERATURE' 
        ) AND [StudentID] = SS.[StudentID]
    ) DS
)
SELECT FDS_M.[StudentID]
      ,FDS_M.[MathsScore] AS [MathsScore]
      ,FDS_M.[MathsResultDate] AS [MathsResultDate]
      ,FDS_S.[Score] AS [ScienceScore]
      ,FDS_S.[ResultDate] AS [ScienceResultDate] 
      ,FDS_L.[Score] AS [LiteratureScore]
      ,FDS_L.[ResultDate] AS [LiteratureResultDate] 
FROM DataSource_Maths FDS_M
LEFT JOIN DataSource_Others FDS_S
    ON FDS_M.[InternalID] = FDS_S.[InternalID]
    AND FDS_S.[SubjectCode] = 'SCIENCE'
    AND FDS_S.[RowID] = 1
LEFT JOIN DataSource_Others FDS_L
    ON FDS_M.[InternalID] = FDS_L.[InternalID]
    AND FDS_L.[SubjectCode] = 'LITERATURE'
    AND FDS_L.[RowID] = 1;

Of course in your more complex example you can materialized the CTE clauses in temporary tables (for example) in order to simplify and optimize the query.

like image 140
gotqn Avatar answered Nov 11 '22 16:11

gotqn