I have some data. I want to group them based on the value of data
column. If there are 3 or more consecutive rows that have data bigger than 10, then those rows are what I want.
So for this data:
use tempdb;
go
set nocount on;
if object_id('t', 'U') is not null
drop table t;
go
create table t
(
id int primary key identity,
[when] datetime,
data int
)
go
insert into t([when], data) values ('20130801', 1);
insert into t([when], data) values ('20130802', 121);
insert into t([when], data) values ('20130803', 132);
insert into t([when], data) values ('20130804', 15);
insert into t([when], data) values ('20130805', 9);
insert into t([when], data) values ('20130806', 1435);
insert into t([when], data) values ('20130807', 143);
insert into t([when], data) values ('20130808', 18);
insert into t([when], data) values ('20130809', 19);
insert into t([when], data) values ('20130810', 1);
insert into t([when], data) values ('20130811', 1234);
insert into t([when], data) values ('20130812', 124);
insert into t([when], data) values ('20130813', 6);
select * from t;
What I want is:
id when data
----------- ----------------------- -----------
2 2013-08-02 00:00:00.000 121
3 2013-08-03 00:00:00.000 132
4 2013-08-04 00:00:00.000 15
6 2013-08-06 00:00:00.000 1435
7 2013-08-07 00:00:00.000 143
8 2013-08-08 00:00:00.000 18
9 2013-08-09 00:00:00.000 19
How to do that?
The standard gaps-and-island solution is to group by (value minus row_number), since that is invariant within a consecutive sequence. The start and end dates are just the MIN() and MAX() of the group.
Begin by applying the DENSE_RANK function to the rows. To produce the group identifier, we subtract the result of DENSE_RANK from the row value. As the sequence increases, the result of this calculation remains constant but then changes when a new sequence starts. We use this constant to identify the islands.
SQL Server LAG() is a window function that provides access to a row at a specified physical offset which comes before the current row. In other words, by using the LAG() function, from the current row, you can access data of the previous row, or the row before the previous row, and so on.
Try this
WITH cte
AS
(
SELECT *,COUNT(1) OVER(PARTITION BY cnt) pt FROM
(
SELECT tt.*
,(SELECT COUNT(id) FROM t WHERE data <= 10 AND ID < tt.ID) AS cnt
FROM t tt
WHERE data > 10
) t1
)
SELECT id, [when], data FROM cte WHERE pt >= 3
SQL FIDDLE DEMO
OUTPUT
id when data
2 2013-08-02 00:00:00.000 121
3 2013-08-03 00:00:00.000 132
4 2013-08-04 00:00:00.000 15
6 2013-08-06 00:00:00.000 1435
7 2013-08-07 00:00:00.000 143
8 2013-08-08 00:00:00.000 18
9 2013-08-09 00:00:00.000 19
EDIT
First the inner query counts the no of records where data <= 10
SELECT tt.*
,(SELECT COUNT(id) FROM t WHERE data <= 10 AND ID < tt.ID) AS cnt
FROM t tt
output
id when data cnt
1 2013-08-01 00:00:00.000 1 1
2 2013-08-02 00:00:00.000 121 1
3 2013-08-03 00:00:00.000 132 1
4 2013-08-04 00:00:00.000 15 1
5 2013-08-05 00:00:00.000 9 2
6 2013-08-06 00:00:00.000 1435 2
7 2013-08-07 00:00:00.000 143 2
8 2013-08-08 00:00:00.000 18 2
9 2013-08-09 00:00:00.000 19 2
10 2013-08-10 00:00:00.000 1 3
11 2013-08-11 00:00:00.000 1234 3
12 2013-08-12 00:00:00.000 124 3
13 2013-08-13 00:00:00.000 6 4
Then we filter the records with data > 10
WHERE data > 10
Now we count the records by partitoning cnt column
SELECT *,COUNT(1) OVER(PARTITION BY cnt) pt FROM
(
SELECT tt.*
,(SELECT COUNT(id) FROM t WHERE data <= 10 AND ID < tt.ID) AS cnt
FROM t tt
WHERE data > 10
) t1
Output
id when data cnt pt
2 2013-08-02 00:00:00.000 121 1 3
3 2013-08-03 00:00:00.000 132 1 3
4 2013-08-04 00:00:00.000 15 1 3
6 2013-08-06 00:00:00.000 1435 2 4
7 2013-08-07 00:00:00.000 143 2 4
8 2013-08-08 00:00:00.000 18 2 4
9 2013-08-09 00:00:00.000 19 2 4
11 2013-08-11 00:00:00.000 1234 3 2
12 2013-08-12 00:00:00.000 124 3 2
The above query is put in cte just like temp table
Now select the records that are having the consecutive count >= 3
SELECT id, [when], data FROM cte WHERE pt >= 3
ANOTHER SOLUTION
;WITH partitioned AS (
SELECT *, id - ROW_NUMBER() OVER (ORDER BY id) AS grp
FROM t
WHERE data > 10
),
counted AS (
SELECT *, COUNT(*) OVER (PARTITION BY grp) AS cnt
FROM partitioned
)
SELECT id, [when], data
FROM counted
WHERE cnt >= 3
Reference URL
SQL FIDDLE DEMO
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With