Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Date Difference between consecutive rows - complicated

Tags:

sql

ms-access

I had previously posted a question which was answered but I need a query for this too. I have a table structure with data like this (dates in the format dd/mm/yyyy).

ID    Account Number    Unit    Admit_Date    Disch_Date
1     1001              w32     01/04/2012    
2     1002              w32     01/04/2012    01/04/2012
3     1001              ccu     03/04/2012
4     1001              w33     05/04/2012
5     1003              cicu    04/04/2012
6     1001              ccu     07/04/2012
7     1001              ccu     07/04/2012    10/04/2012
8     1003              w33     05/04/2012
9     1003              w33     05/04/2012    08/04/2012

Basically this table deals with patients getting admitted to a particular ward and transferred between wards and then finally discharged either on same day or few days later. The expected result from query would be:

Account_Number                                 No. Of Days
1001              01/04/2012    03/04/2012      2
1001              03/04/2012    05/04/2012      2
1001              05/03/2012    07/04/2012      2
1001              07/04/2012    10/04/2012      3
1002              01/04/2012    01/04/2012      0
1003              04/04/2012    05/04/2012      1
1003              05/04/2012    08/04/2012      3

The discharge date field will only be filled when the patient is discharged, hence I would like to calculate date difference between each date of movement of the patient including both admission and the date of discharge.

I use MS Access 2003.

I hope that some one will be able to help me with this.

like image 280
Mohammed Rishal Avatar asked Apr 11 '12 01:04

Mohammed Rishal


People also ask

How do I compare two consecutive rows in SQL?

Here's the SQL query to compare each row with previous row. In the above query, we join sales table with itself using an INNER JOIN condition g2.id=g1.id + 1 that allows you to compare each row with its previous row. Please note, this condition depends on the fact that our id column has consecutive numbers.

How do you calculate time lag between two consecutive orders?

How about this: select recordid, transdate, cast( (transdate - lag(transdate) over (order by transdate)) as time) as diff from t; In other words, you can subtract two datetime values and cast the result as a time. You can then format the result however you like.

How do you calculate row difference?

Method 1 : Using diff() method diff() method in base R is used to find the difference among all the pairs of consecutive rows in the R dataframe. It returns a vector with the length equivalent to the length of the input column – 1.

What is consecutive rows?

A row is consecutive with the previous when the current date equals the previous date plus one.


1 Answers

Filtering out the irrelevant data

With any complex query, part of the art is building up the query piece by piece, testing as you go.

I'm assuming that the table name is PatientMovements and that:

Given pairs of rows like ID = {6,7} and ID = {8,9}, it is correct to say that the row where the patient (account number), unit and admission date with null discharge date is ignored when there is also a record for the same patient, unit and admit date but a non-null discharge date.

So, step one is to generate the rows that we need to work on, filtering out the irrelevant data from the table recorded in the database. This is a UNION of two sets of data:

  1. Those rows with a non-null discharge date.
  2. Those rows with a null discharge date but no row for the same account, unit and admission date.

Clearly, the first part of the UNION is:

SELECT * FROM PatientMovements WHERE DischargeDate IS NOT NULL

Less obviously, the second part of the UNION is:

SELECT *
  FROM PatientMovements AS p1
 WHERE DischargeDate IS NULL
   AND NOT EXISTS
       (SELECT *
          FROM PatientMovements AS P2
         WHERE P1.Account   = P2.Account
           AND P1.Unit      = P2.Unit
           AND P1.AdmitDate = P2.AdmitDate
           AND P2.DischargeDate IS NOT NULL
       )

Now you can combine those into a single result set:

SELECT *
  FROM PatientMovements
 WHERE DischargeDate IS NOT NULL
UNION
SELECT *
  FROM PatientMovements AS p1
 WHERE DischargeDate IS NULL
   AND NOT EXISTS
       (SELECT *
          FROM PatientMovements AS P2
         WHERE P1.Account   = P2.Account
           AND P1.Unit      = P2.Unit
           AND P1.AdmitDate = P2.AdmitDate
           AND P2.DischargeDate IS NOT NULL
       )

You can verify the query above by checking that it returns rows with IDs 1..5, 7, and 9.

Warning: untested code. None of the SQL in this answer has been near a DBMS, so it is untested.

Applying Lessons Learned Previously

And then you can apply your learning from the other question to order the data and calculate the date differences, etc. The only complication is that you have to write that query out twice, which is painful (unless MS Access 2003 support the 'WITH' clause or common table expression).


But would there be no single query to obtain this required output?

The UNION is a single query, of course. I suppose you could just write:

SELECT *
  FROM PatientMovements
 WHERE (DischargeDate IS NOT NULL)
    OR (DischargeDate IS     NULL
        AND NOT EXISTS
            (SELECT *
               FROM PatientMovements AS P2
              WHERE P1.Account   = P2.Account
                AND P1.Unit      = P2.Unit
                AND P1.AdmitDate = P2.AdmitDate
                AND P2.DischargeDate IS NOT NULL
            )
       )

I can't immediately think of a more compact way of doing the query.


Building the UNION into 'The Other Answer'

The accepted answer to the other question has two possible solutions (as amended by comments and reformatted):

SELECT T1.ID, T1.AccountNumber, T1.Date, 
       MIN(T2.Date) AS NextDate, 
       DATEDIFF("D", T1.Date, MIN(T2.Date)) AS DaysDiff
  FROM YourTable T1
  JOIN YourTable T2
    ON T1.AccountNumber = T2.AccountNumber AND T2.Date > T1.Date

Or:

SELECT ID, AccountNumber, Date, NextDate,
       DATEDIFF("D", Date, NextDate) AS DaysDiff
  FROM (SELECT ID, AccountNumber, Date,
               (SELECT MIN(Date) 
                  FROM YourTable T2
                 WHERE T2.AccountNumber = T1.AccountNumber
                   AND T2.Date > T1.Date
               ) AS NextDate
          FROM YourTable T1
        ) AS T

As noted in a comment, the absence of the table name in the question leads to different table names appearing in the answer; what I called PatientMovements was called YourTable in this answer. The other difference is that the original question did not include the Unit or DischargeDate columns in the data. However, the UNION query I gave gives the relevant data on which to run these queries, so all that's left to do is write the UNION query into the other answers in place of YourTable. This leads to:

SELECT T1.ID, T1.AccountNumber, T1.Date, 
       MIN(T2.Date) AS NextDate, 
       DATEDIFF("D", T1.Date, MIN(T2.Date)) AS DaysDiff
  FROM (SELECT *
          FROM PatientMovements
         WHERE (DischargeDate IS NOT NULL)
            OR (DischargeDate IS     NULL
                AND NOT EXISTS
                    (SELECT *
                       FROM PatientMovements AS P2
                      WHERE P1.Account   = P2.Account
                        AND P1.Unit      = P2.Unit
                        AND P1.AdmitDate = P2.AdmitDate
                        AND P2.DischargeDate IS NOT NULL
                    )
               )
       ) AS T1
  JOIN (SELECT *
          FROM PatientMovements
         WHERE (DischargeDate IS NOT NULL)
            OR (DischargeDate IS     NULL
                AND NOT EXISTS
                    (SELECT *
                       FROM PatientMovements AS P2
                      WHERE P1.Account   = P2.Account
                        AND P1.Unit      = P2.Unit
                        AND P1.AdmitDate = P2.AdmitDate
                        AND P2.DischargeDate IS NOT NULL
                    )
               )
       ) AS T2
    ON T1.AccountNumber = T2.Accountnumber AND T2.Date > T1.Date

Or:

SELECT ID, AccountNumber, Date, NextDate,
       DATEDIFF("D", Date, NextDate) AS DaysDiff
  FROM (SELECT ID, AccountNumber, Date,
               (SELECT MIN(Date) 
                  FROM (SELECT *
                          FROM PatientMovements
                         WHERE (DischargeDate IS NOT NULL)
                            OR (DischargeDate IS     NULL
                                AND NOT EXISTS
                                    (SELECT *
                                       FROM PatientMovements AS P2
                                      WHERE P1.Account   = P2.Account
                                        AND P1.Unit      = P2.Unit
                                        AND P1.AdmitDate = P2.AdmitDate
                                        AND P2.DischargeDate IS NOT NULL
                                    )
                               )
                       ) AS T2
                 WHERE T2.Accountnumber = T1.AccountNumber
                   AND T2.Date > T1.Date
               ) AS NextDate
          FROM (SELECT *
                  FROM PatientMovements
                 WHERE (DischargeDate IS NOT NULL)
                    OR (DischargeDate IS     NULL
                        AND NOT EXISTS
                            (SELECT *
                               FROM PatientMovements AS P2
                              WHERE P1.Account   = P2.Account
                                AND P1.Unit      = P2.Unit
                                AND P1.AdmitDate = P2.AdmitDate
                                AND P2.DischargeDate IS NOT NULL
                            )
                       )
               ) AS T1
        ) AS T

So, as long as you are careful, and develop queries in fragments, and then combine them consistently, the most awful looking query can be tamed.

Common Table Expressions

Note that the SQL Standard has 'common table expressions' (CTEs) aka 'WITH clauses' which can make things still easier.

WITH YourTable AS
   (SELECT *
      FROM PatientMovements
     WHERE (DischargeDate IS NOT NULL)
        OR (DischargeDate IS     NULL
            AND NOT EXISTS
                (SELECT *
                   FROM PatientMovements AS P2
                  WHERE P1.Account   = P2.Account
                    AND P1.Unit      = P2.Unit
                    AND P1.AdmitDate = P2.AdmitDate
                    AND P2.DischargeDate IS NOT NULL
                )
           )
     )
SELECT T1.ID, T1.AccountNumber, T1.Date, 
       MIN(T2.Date) AS NextDate, 
       DATEDIFF("D", T1.Date, MIN(T2.Date)) AS DaysDiff
  FROM YourTable T1
  JOIN YourTable T2
    ON T1.AccountNumber = T2.AccountNumber AND T2.Date > T1.Date

Or:

WITH YourTable AS
   (SELECT *
      FROM PatientMovements
     WHERE (DischargeDate IS NOT NULL)
        OR (DischargeDate IS     NULL
            AND NOT EXISTS
                (SELECT *
                   FROM PatientMovements AS P2
                  WHERE P1.Account   = P2.Account
                    AND P1.Unit      = P2.Unit
                    AND P1.AdmitDate = P2.AdmitDate
                    AND P2.DischargeDate IS NOT NULL
                )
           )
     )
SELECT ID, AccountNumber, Date, NextDate,
       DATEDIFF("D", Date, NextDate) AS DaysDiff
  FROM (SELECT ID, AccountNumber, Date,
               (SELECT MIN(Date) 
                  FROM YourTable T2
                 WHERE T2.AccountNumber = T1.AccountNumber
                   AND T2.Date > T1.Date
               ) AS NextDate
          FROM YourTable T1
        ) AS T

One of the major advantages of using a CTE is that the optimizer is told explicitly that the table expressions is the same in all places it is used, whereas when it is written out several times, it might not spot that commonality. Plus, writing the query out several times opens up the possibility that the two 'meant to be the same' queries are actually slightly different because of an editing error; that possibility is precluded by the CTE. The other advantage in the current context was that combining the CTE with the solutions to the other question was child's play.

Sadly for you, it is unlikely that MS Access 2003 supports CTEs. I share your pain; the DBMS I work with mainly doesn't either.

like image 124
Jonathan Leffler Avatar answered Oct 18 '22 13:10

Jonathan Leffler