Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to group together sequential, timestamped rows in SQL and return the date range for each group

I've got an MS SQL 2008 database table that looks like the following:

Registration | Date | DriverID | TrailerID

An example of what some of the data would look like is as follows:

AB53EDH,2013/07/03 10:00,54,23
AB53EDH,2013/07/03 10:01,54,23
...
AB53EDH,2013/07/03 10:45,54,23
AB53EDH,2013/07/03 10:46,54,NULL <-- Trailer changed
AB53EDH,2013/07/03 10:47,54,NULL
...
AB53EDH,2013/07/03 11:05,54,NULL
AB53EDH,2013/07/03 11:06,54,102  <-- Trailer changed
AB53EDH,2013/07/03 11:07,54,102
...
AB53EDH,2013/07/03 12:32,54,102
AB53EDH,2013/07/03 12:33,72,102  <-- Driver changed
AB53EDH,2013/07/03 12:34,72,102

As you can see, the data represents which driver and which trailer were attached to which registration at any point in time. What I'd like to do is to generate a report that contains periods that each combination of driver and trailer were active for. So for the above example data, I'd want to generate something that looks like this:

Registration,StartDate,EndDate,DriverID,TrailerID
AB53EDH,2013/07/03 10:00,2013/07/03 10:45,54,23
AB53EDH,2013/07/03 10:46,2013/07/03 11:05,54,NULL
AB53EDH,2013/07/03 11:06,2013/07/03 12:32,54,102
AB53EDH,2013/07/03 12:33,2013/07/03 12:34,72,102

How would you go about doing this via SQL?

UPDATE: Thanks to the answers so far. Unfortunately, they stopped working when I applied it to production data I have. The queries submitted so far fail to work correctly when applied on part of the data.

Here's some sample queries to generate a data table and populate it with the dummy data above. There is more data here than in the example above: the driver,trailer combinations 54,23 and 54,NULL have been repeated in order to make sure that queries recognise that these are two distinct groups. I've also replicated the same data three times with different date ranges, in order to test if queries will work when run on part of the data set:

CREATE TABLE [dbo].[TempTable](
    [Registration] [nvarchar](50) NOT NULL,
    [Date] [datetime] NOT NULL,
    [DriverID] [int] NULL,
    [TrailerID] [int] NULL
)

INSERT INTO dbo.TempTable
VALUES 
('AB53EDH','2013/07/03 10:00', 54,23),
('AB53EDH','2013/07/03 10:01', 54,23),
('AB53EDH','2013/07/03 10:45', 54,23),
('AB53EDH','2013/07/03 10:46', 54,NULL),
('AB53EDH','2013/07/03 10:47', 54,NULL),
('AB53EDH','2013/07/03 11:05', 54,NULL),
('AB53EDH','2013/07/03 11:06', 54,102),
('AB53EDH','2013/07/03 11:07', 54,102),
('AB53EDH','2013/07/03 12:32', 54,102),
('AB53EDH','2013/07/03 12:33', 72,102),
('AB53EDH','2013/07/03 12:34', 72,102),
('AB53EDH','2013/07/03 13:00', 54,102),
('AB53EDH','2013/07/03 13:01', 54,102),
('AB53EDH','2013/07/03 13:02', 54,102),
('AB53EDH','2013/07/03 13:03', 54,102),
('AB53EDH','2013/07/03 13:04', 54,23),
('AB53EDH','2013/07/03 13:05', 54,23),
('AB53EDH','2013/07/03 13:06', 54,23),
('AB53EDH','2013/07/03 13:07', 54,NULL),
('AB53EDH','2013/07/03 13:08', 54,NULL),
('AB53EDH','2013/07/03 13:09', 54,NULL),
('AB53EDH','2013/07/03 13:10', 54,NULL),
('AB53EDH','2013/07/03 13:11', NULL,NULL)

INSERT INTO dbo.TempTable
SELECT Registration, DATEADD(M, -1, Date), DriverID, TrailerID
FROM dbo.TempTable
WHERE Date > '2013/07/01'

INSERT INTO dbo.TempTable
SELECT Registration, DATEADD(M, 1, Date), DriverID, TrailerID
FROM dbo.TempTable
WHERE Date > '2013/07/01'
like image 213
Amr Bekhit Avatar asked Jul 03 '13 13:07

Amr Bekhit


People also ask

How do you group data together in SQL?

The Group By statement is used to group together any rows of a column with the same value stored in them, based on a function specified in the statement. Generally, these functions are one of the aggregate functions such as MAX() and SUM(). This statement is used with the SELECT command in SQL.

How do I write a SQL query for a specific date range and date time?

SELECT * FROM YourTable. WHERE [dateColumn] >DATEADD(day,1,'4/25/2022') AND [dateColumn] <= DATEADD(day,1,'4/26/2022') AND DATEPART(hh,[dateColumn]) >= 7 AND DATEPART(hh,[dateColumn]) <= 19.

Can you group by date in SQL?

To group by date part, use the GROUP BY clause and the EXTRACT() function. Pass EXTRACT() the date parts to isolate.


2 Answers

This query uses CTEs to:

  1. Create an ordered collection of records grouped by Registration
  2. For each record, capture the data of the previous record
  3. Compare current and previous data to determine if the current record is a new instance of a driver / trailer assignment
  4. Get only the new records
  5. For each new record, get the last date before a new driver / trailer assignment occurs

Link to SQL Fiddle

Code below:

;WITH c AS (
-- Group records by Registration, assign row numbers in order of date
SELECT
  ROW_NUMBER() OVER (
    PARTITION BY Registration 
    ORDER BY Registration, [Date]) 
  AS Rn,
  Registration,
  [Date],
  DriverID,
  TrailerID
FROM 
  TempTable
)
,c2 AS (
-- Self join to table to get Driver and Trailer from previous record
SELECT 
  t1.Rn,
  t1.Registration,
  t1.[Date],
  t1.DriverID,
  t1.TrailerID,
  t2.DriverID AS PrevDriverID,
  t2.TrailerID AS PrevTrailerID
FROM 
  c t1
LEFT OUTER JOIN 
  c t2
ON 
  t1.Registration = t2.Registration
AND 
  t2.Rn = t1.Rn - 1 
)
,c3 AS (
-- Use INTERSECT to determine if this record is new in sequence
SELECT
  Rn,
  Registration,
  [Date],
  DriverID,
  TrailerID,
  CASE WHEN NOT EXISTS (
            SELECT DriverID, TrailerID 
            INTERSECT 
            SELECT PrevDriverID, PrevTrailerID) 
       THEN 1
       ELSE 0
  END AS IsNew
FROM c2 
) 
-- For all new records in sequence, 
-- get the last date logged before a new record appeared
SELECT 
  Registration,
  [Date] AS StartDate,
  COALESCE (
    (
       SELECT TOP 1 [Date]
       FROM c3 
       WHERE Registration = t.Registration
       AND Rn < (
         SELECT TOP 1 Rn
         FROM c3 
         WHERE Registration = t.Registration 
         AND Rn > t.Rn 
         AND IsNew = 1 
         ORDER BY Rn )
       ORDER BY Rn DESC 
    )
    , [Date]) AS EndDate,
  DriverID,
  TrailerID
FROM 
  c3 t
WHERE
  IsNew = 1 
ORDER BY 
  Registration,
  StartDate
like image 99
8kb Avatar answered Oct 29 '22 18:10

8kb


try-:

DECLARE @TempTable AS TABLE (
    [Registration] [nvarchar](50) NOT NULL,
    [Date] [datetime] NOT NULL,
    [DriverID] [int] NULL,
    [TrailerID] [int] NULL
)

INSERT INTO @TempTable
VALUES 
('AB53EDH','2013-07-03 10:00', 54,23),
('AB53EDH','2013-07-03 10:01', 54,23),
('AB53EDH','2013-07-03 10:45', 54,23),
('AB53EDH','2013-07-03 10:46', 54,nULL),
('AB53EDH','2013-07-03 10:47', 54,NULL),
('AB53EDH','2013-07-03 11:05', 54,NULL),
('AB53EDH','2013-07-03 11:06', 54,102),
('AB53EDH','2013-07-03 11:07', 54,102),
('AB53EDH','2013-07-03 12:32', 54,102),
('AB53EDH','2013-07-03 12:33', 72,102),
('AB53EDH','2013-07-03 12:34', 72,102)

SELECT t1.Registration, MIN(t1.Date) AS StartDate, MAX(t1.date) AS EndDate, t1.DriverID, t1.TrailerID
FROM @TempTable AS t1
INNER JOIN @TempTable AS t2
    ON t1.Registration = t2.Registration AND (t1.DriverID = t2.DriverID OR t1.TrailerID = t2.TrailerID)
GROUP BY t1.Registration, t1.DriverID, t1.TrailerID
    ORDER BY MIN(t1.Date)
like image 31
armen Avatar answered Oct 29 '22 18:10

armen