Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I group rows by DateTime if the values are almost the same?

How can I group rows if the different time between on field and other field in other row almost the same?

For example:

GUID - No    - sDateTime            - eDateTime           - Name  
------------------------------------------------------------------    
0000 - 01    - 2013-02-02 08:00:00  - 2013-02-02 08:40:00 - A 
0000 - 02    - 2013-02-02 08:45:00  - 2013-02-02 09:45:00 - A 
0000 - 03    - 2013-02-02 11:30:00  - 2013-02-02 12:00:00 - A 
0000 - 04    - 2013-02-02 09:55:00  - 2013-02-02 11:00:00 - A   
0000 - 05    - 2013-02-02 11:05:00  - 2013-02-02 12:30:00 - B   

How can I get sDateTime from row with eDateTime from other row if the different not more than 10 minutes?

GUID - No      - SDateTime            - EDateTime (From other row) - Name  
----------------------------------------------------------------------------  
0000 -01,02,04 - 2013-02-02 08:00:00  - **2013-02-02 11:00:00**    - A 
0000 - 03      - 2013-02-02 11:30:00  - 2013-02-02 12:00:00        - A
0000 - 05      - 2013-02-02 11:05:00  - 2013-02-02 12:30:00        - B 
like image 881
Mohammed Rabee Avatar asked Feb 12 '13 21:02

Mohammed Rabee


2 Answers

No cursors here but this is still a loop over rows using a recursive CTE to figure out what rows should be grouped together. The temp table #T is there to hold the ID that connects rows to each other. It could have been done without the temp table but for performance reasons it is better to do it with a temp table instead of having another CTE as the source for the recursive CTE.

In the final query I use the for xml path trick to concatenate the values from No. That means that the CTE C is used a second time so it will be executed twice. You might want to turn that CTE to a temp table as well to avoid that.

SQL Fiddle

MS SQL Server 2008 Schema Setup:

create table YourTable
(
  GUID varchar(4),
  No varchar(2),
  sDateTime datetime,
  eDateTime datetime,
  Name varchar(1)
);

insert into YourTable values
('0000', '01', '2013-02-02 08:00:00', '2013-02-02 08:40:00', 'A'),
('0000', '02', '2013-02-02 08:45:00', '2013-02-02 09:45:00', 'A'),
('0000', '03', '2013-02-02 11:30:00', '2013-02-02 12:00:00', 'A'),
('0000', '04', '2013-02-02 09:55:00', '2013-02-02 11:00:00', 'A'),
('0000', '05', '2013-02-02 11:05:00', '2013-02-02 12:30:00', 'B');

Query 1:

create table #T
(
  ID int,
  GUID varchar(4),
  No varchar(2),
  sDateTime datetime,
  eDateTime datetime,
  Name varchar(1),
  primary key(ID, GUID, Name)
);

insert into #T(ID, GUID, No, sDateTime, eDateTime, Name)
select row_number() over(partition by GUID, Name order by sDateTime),
       GUID, No, sDateTime, eDateTime, Name
from YourTable;

with C as
(
  select T.ID, T.GUID, T.No, T.sDateTime, T.eDateTime, T.Name, 1 as Grp
  from #T as T
  where T.ID = 1
  union all
  select T.ID, T.GUID, T.No, T.sDateTime, T.eDateTime, T.Name, 
         C.Grp + case when datediff(minute, C.eDateTime, T.sDateTime) > 10 
                   then 1 
                   else 0 
                 end
  from #T as T
    inner join C 
      on T.ID = C.ID + 1 and
         T.Name = C.Name and
         T.GUID = C.GUID
)
select C.GUID,
       (
       select ','+C2.No
       from C as C2
       where C.GUID = C2.GUID and
             C.Name = C2.Name and
             C.Grp = C2.Grp
       order by C2.No
       for xml path(''), type
       ).value('substring(text()[1], 2)', 'varchar(max)') as No,
       min(C.sDateTime) as sDateTime,
       max(C.eDateTime) as eDateTime,
       C.Name
from C
group by C.GUID, C.Name, C.Grp

drop table #T;

Results:

| GUID |       NO |                       SDATETIME |                       EDATETIME | NAME |
----------------------------------------------------------------------------------------------
| 0000 | 01,02,04 | February, 02 2013 08:00:00+0000 | February, 02 2013 11:00:00+0000 |    A |
| 0000 |       03 | February, 02 2013 11:30:00+0000 | February, 02 2013 12:00:00+0000 |    A |
| 0000 |       05 | February, 02 2013 11:05:00+0000 | February, 02 2013 12:30:00+0000 |    B |

Just for fun, a SQL Server 2012 version.

with CMinDiff as
(
  select GUID, No, sDateTime, eDateTime, Name, 
         case when datediff(minute, 
                            coalesce(lag(eDateTime) over(partition by GUID, Name 
                                                         order by eDateTime), 
                                     sDateTime), 
                            sDateTime) <= 10 
           then 0 
           else 1 
         end as MinDiff
  from YourTable
), CSumMinDiff as
(
  select GUID, No, sDateTime, eDateTime, Name, 
         sum(MinDiff) over(partition by GUID, Name 
                           order by sDateTime 
                           rows between unbounded preceding and current row) as Grp
  from CMinDiff
)
select C.GUID,
       (
       select ','+C2.No
       from CSumMinDiff as C2
       where C.GUID = C2.GUID and
             C.Name = C2.Name and
             C.Grp = C2.Grp
       order by C2.No
       for xml path(''), type
       ).value('substring(text()[1], 2)', 'varchar(max)') as No,
       min(C.sDateTime) as sDateTime,
       max(C.eDateTime) as eDateTime,
       C.Name
from CSumMinDiff as C
group by C.GUID, C.Name, C.Grp
like image 189
Mikael Eriksson Avatar answered Oct 11 '22 15:10

Mikael Eriksson


I was unable to perform this without a cursor, maybe somebody smarter can come up with a solution that does not include cursors, but this is what I have:

CREATE TABLE MyTable
(
    [GUID] VARCHAR(4),
    [No] VARCHAR(2),
    [sDateTime] DATETIME,
    [eDateTime] DATETIME,
    [Name] VARCHAR(1)
)

INSERT INTO [MyTable]
([GUID], [No], [sDateTime], [eDateTime], [Name])
VALUES
('0000', '01', '2013-02-02 08:00:00', '2013-02-02 08:40:00', 'A'),
('0000', '02', '2013-02-02 08:45:00', '2013-02-02 09:45:00', 'A'),
('0000', '03', '2013-02-02 11:30:00', '2013-02-02 12:00:00', 'A'),
('0000', '04', '2013-02-02 09:55:00', '2013-02-02 11:00:00', 'A'),
('0000', '05', '2013-02-02 11:05:00', '2013-02-02 12:30:00', 'B')

DECLARE @MyTable TABLE 
(
    [GUID] VARCHAR(4),
    [No] VARCHAR(100),
    [sDateTime] DATETIME,
    [eDateTime] VARCHAR(100),
    [Name] VARCHAR(1)
)

DECLARE [MyCursor] CURSOR FOR 
SELECT [GUID], [No], [sDateTime], [eDateTime], [Name]
FROM [MyTable]
ORDER BY [Name], [sDateTime];

DECLARE @GUID AS VARCHAR(4)
DECLARE @No AS VARCHAR(2)
DECLARE @sDateTime AS DATETIME
DECLARE @eDateTime AS DATETIME
DECLARE @Name AS VARCHAR(1)
DECLARE @OldName AS VARCHAR(1)
SET @OldName = ''
DECLARE @MergedNo AS VARCHAR(100)
SET @MergedNo = NULL
DECLARE @sOldDateTime AS DATETIME
DECLARE @eOldDateTime AS DATETIME

OPEN [MyCursor]
FETCH NEXT FROM [MyCursor] 
INTO @GUID, @No, @sDateTime, @eDateTime, @Name

WHILE @@FETCH_STATUS = 0
BEGIN
    IF (@OldName = '')
    BEGIN
        SET @sOldDateTime = @sDateTime
        SET @OldName = @Name
    END
    ELSE IF (@OldName <> @Name OR DATEDIFF(MINUTE, @eOldDateTime, @sDateTime) > 10)
    BEGIN
        INSERT INTO @MyTable
        ([GUID], [No], [sDateTime], [eDateTime], [Name])
        VALUES
        (@GUID, @MergedNo, @sOldDateTime, CASE WHEN @MergedNo LIKE '%,%' THEN '**' + CONVERT(VARCHAR(100), @eOldDateTime, 20) + '**' ELSE CONVERT(VARCHAR(100), @eOldDateTime, 20) END, @Name)

        SET @sOldDateTime = @sDateTime 
        SET @OldName = @Name
        SET @MergedNo = NULL
    END

    SET @MergedNo = COALESCE(@MergedNo + ', ', '') + @No
    SET @eOldDateTime = @eDateTime

    FETCH NEXT FROM [MyCursor] 
    INTO @GUID, @No, @sDateTime, @eDateTime, @Name
END 
CLOSE [MyCursor];
DEALLOCATE [MyCursor];

IF (@OldName <> '')
BEGIN
    INSERT INTO @MyTable
    ([GUID], [No], [sDateTime], [eDateTime], [Name])
    VALUES
    (@GUID, @MergedNo, @sOldDateTime, CASE WHEN @MergedNo LIKE '%,%' THEN '**' + CONVERT(VARCHAR(100), @eOldDateTime, 20) + '**' ELSE CONVERT(VARCHAR(100), @eOldDateTime, 20) END, @Name)
END

SELECT *
FROM @MyTable

DROP TABLE [MyTable]
like image 21
Geert Immerzeel Avatar answered Oct 11 '22 15:10

Geert Immerzeel