Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I group records having time difference of more than an hour?

I'm new to this site but bear with me.

I'm trying to GROUP BY some data using SQL Server.

Here's the data:

Computer    VisitDate
ComputerA   2012-04-28 09:00:00
ComputerA   2012-04-28 09:05:00
ComputerA   2012-04-28 09:10:00
ComputerB   2012-04-28 09:30:00
ComputerB   2012-04-28 09:32:00
ComputerB   2012-04-28 09:44:00
ComputerB   2012-04-28 09:56:00
ComputerB   2012-04-28 10:25:00
ComputerA   2012-04-28 12:25:00
ComputerC   2012-04-28 12:30:00
ComputerC   2012-04-28 12:35:00
ComputerC   2012-04-28 12:45:00
ComputerC   2012-04-28 12:55:00

What I'm trying to achieve is to group the data by Computer but also group if a Computer has a difference between a visit time longer than 1 hour. Here is the result of what I'm trying to do:

Computer     VisitDate
ComputerA    2012-04-28 09:00:00
ComputerB    2012-04-28 09:30:00
ComputerA    2012-04-28 12:25:00
ComputerC    2012-04-28 12:30:00

So Computer A is shown twice because it visited at 09:10:00 and then visited again at 12:25:00 which means a difference of over 1 hour.

It's easy to 'GROUP BY Computer' but the other, I wouldn't know where to start. Any help on this problem would be much appreciated.

like image 551
Jefferson Avatar asked Apr 28 '12 14:04

Jefferson


2 Answers

You cannot do this with a simple GROUP BY. This operator only works on single columns - e.g. you could group by computer name or something, but you cannot add additional logic like difference in time must be greater than one hour or anything like that to the grouping.

What you can do - provided you're on SQL Server 2005 or newer (you didn't mention the version in your question) would be to use CTE's (Common Table Expressions). Those provide a way to slice'n'dice your data.

Here, I'm doing several things - first I'm "partitioning" the data by ComputerName and order by VisitDate and using ROW_NUMBER() to get a sequential number for each partition. Then the second CTE determines the "first" entry for each computer - the one with row number = 1 - and the third finally determines the difference in the VisitDate for each entry, compared to the entry with row number = 1. From that third CTE, I finally select those entries that either have row number = 1 (the first for each "partition"), or anything that has a difference in minutes of 60 or more.

Here's the code:

;WITH Computers AS
(
    SELECT
        ComputerName, VisitDate,
        RN = ROW_NUMBER() OVER(PARTITION BY ComputerName ORDER BY VisitDate)
    FROM    
        dbo.YourComputerTable
),
FirstComputers AS
(
    SELECT ComputerName, VisitDate
    FROM Computers
    WHERE RN = 1
),
SelectedComputers AS
(
    SELECT 
        c.ComputerName, c.VisitDate, c.RN,
        DiffToFirst = ABS(DATEDIFF(MINUTE, c.VisitDate, fc.VisitDate))
    FROM Computers c
    INNER JOIN FirstComputers fc ON c.ComputerName = fc.ComputerName
)
SELECT * 
FROM SelectedComputers
WHERE RN = 1 OR DiffToFirst >= 60
like image 196
marc_s Avatar answered Sep 29 '22 14:09

marc_s


If you've upgraded to SQL Server 2012, you can use LAG for this.

with Lagged as (
  select
    Computer,
    VisitDate,
    LAG(VisitDate,1) over (
      partition by Computer
      order by VisitDate
    ) as LastVisit
  from @Visit
)
  select
    Computer,
    VisitDate
  from Lagged
  where LastVisit is null
  or VisitDate > dateadd(hour,1,LastVisit);

SQL Fiddle here.

like image 42
Steve Kass Avatar answered Sep 29 '22 12:09

Steve Kass