I have a table where messages are stored as they happen. Usually there is a message 'A' and sometimes the A's are separated by a single message 'B'. Now I want to group the values so I'm able to analyze them, for example finding longest 'A'-streak or distribution of 'A'-streaks. I already tried a COUNT-OVER query but that keeps on counting for each message. <pre class="prettyprint"><code>SELECT message, COUNT(*) OVER (ORDER BY Timestamp RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) </code></pre> This is my example data: <pre class="prettyprint"><code>Timestamp Message 20150329 00:00 A 20150329 00:01 A 20150329 00:02 B 20150329 00:03 A 20150329 00:04 A 20150329 00:05 A 20150329 00:06 B </code></pre> I want following output <pre class="prettyprint"><code>Message COUNT A 2 B 1 A 3 B 1 </code></pre>

That was interesting :) <pre class="prettyprint"><code>;WITH cte as ( SELECT Messages.Message, Timestamp, ROW_NUMBER() OVER(PARTITION BY Message ORDER BY Timestamp) AS gn, ROW_NUMBER() OVER (ORDER BY Timestamp) AS rn FROM Messages ), cte2 AS ( SELECT Message, Timestamp, gn, rn, gn - rn as gb FROM cte ), cte3 AS ( SELECT Message, MIN(Timestamp) As Ts, COUNT(1) as Cnt FROM cte2 GROUP BY Message, gb) SELECT Message, Cnt FROM cte3 ORDER BY Ts </code></pre> Here is the result set: <pre class="prettyprint"><code> Message Cnt A 2 B 1 A 3 B 1 </code></pre> The query may be shorter but I post it that way so you can see what's happening. The result is exactly as requested. This is the most important part <code>gn - rn</code> the idea is to number the rows in each partition and at the same time number the rows in the whole set then if you subtract the one from the other you'll get the 'rank' of each group. <pre class="prettyprint"><code>;WITH cte as ( SELECT Messages.Message, Timestamp, ROW_NUMBER() OVER(PARTITION BY Message ORDER BY Timestamp) AS gn, ROW_NUMBER() OVER (ORDER BY Timestamp) AS rn FROM Messages ), cte2 AS ( SELECT Message, Timestamp, gn, rn, gn - rn as gb FROM cte ) SELECT * FROM cte2 Message Timestamp gn rn gb A 2015-03-29 00:00:00.000 1 1 0 A 2015-03-29 00:01:00.000 2 2 0 B 2015-03-29 00:02:00.000 1 3 -2 A 2015-03-29 00:03:00.000 3 4 -1 A 2015-03-29 00:04:00.000 4 5 -1 A 2015-03-29 00:05:00.000 5 6 -1 B 2015-03-29 00:06:00.000 2 7 -5 </code></pre>

Here is a little bit smaller solution: <pre class="prettyprint"><code>DECLARE @t TABLE ( d DATE, m CHAR(1) ) INSERT INTO @t VALUES ( '20150301', 'A' ), ( '20150302', 'A' ), ( '20150303', 'B' ), ( '20150304', 'A' ), ( '20150305', 'A' ), ( '20150306', 'A' ), ( '20150307', 'B' ); WITH c1 AS(SELECT d, m, IIF(LAG(m, 1, m) OVER(ORDER BY d) = m, 0, 1) AS n FROM @t), c2 AS(SELECT m, SUM(n) OVER(ORDER BY d) AS n FROM c1) SELECT m, COUNT(*) AS c FROM c2 GROUP BY m, n </code></pre> Output: <pre class="prettyprint"><code>m c A 2 B 1 A 3 B 1 </code></pre> The idea is to get value <code>1</code> at rows where message is changed: <pre class="prettyprint"><code>2015-03-01 A 0 2015-03-02 A 0 2015-03-03 B 1 2015-03-04 A 1 2015-03-05 A 0 2015-03-06 A 0 2015-03-07 B 1 </code></pre> The second step is just sum of current row value + all preceding values: <pre class="prettyprint"><code>2015-03-01 A 0 2015-03-02 A 0 2015-03-03 B 1 2015-03-04 A 2 2015-03-05 A 2 2015-03-06 A 2 2015-03-07 B 3 </code></pre> This way you get grouping sets by message column and calculated column.

Grouping and counting rows by value until it changes

Tags:

sql

sql-server-2014

I have a table where messages are stored as they happen. Usually there is a message 'A' and sometimes the A's are separated by a single message 'B'. Now I want to group the values so I'm able to analyze them, for example finding longest 'A'-streak or distribution of 'A'-streaks.

I already tried a COUNT-OVER query but that keeps on counting for each message.

SELECT message, COUNT(*) OVER (ORDER BY Timestamp RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

This is my example data:

Timestamp        Message
20150329 00:00   A
20150329 00:01   A
20150329 00:02   B
20150329 00:03   A
20150329 00:04   A
20150329 00:05   A
20150329 00:06   B

I want following output

Message    COUNT
A          2
B          1
A          3
B          1

530

asked Mar 29 '15 09:03

dwonisch

2 Answers

That was interesting :)

;WITH cte as (
SELECT Messages.Message, Timestamp, 
ROW_NUMBER() OVER(PARTITION BY Message ORDER BY Timestamp) AS gn,
ROW_NUMBER() OVER (ORDER BY Timestamp) AS rn
FROM Messages
), cte2 AS (
SELECT Message, Timestamp, gn, rn, gn - rn  as gb
FROM cte 
), cte3 AS (
SELECT Message, MIN(Timestamp) As Ts, COUNT(1) as Cnt
FROM cte2
GROUP BY Message, gb)
SELECT Message, Cnt FROM cte3
ORDER BY Ts

Here is the result set:

  Message   Cnt
    A   2
    B   1
    A   3
    B   1

The query may be shorter but I post it that way so you can see what's happening. The result is exactly as requested. This is the most important part gn - rn the idea is to number the rows in each partition and at the same time number the rows in the whole set then if you subtract the one from the other you'll get the 'rank' of each group.

;WITH cte as (
SELECT Messages.Message, Timestamp, 
ROW_NUMBER() OVER(PARTITION BY Message ORDER BY Timestamp) AS gn,
ROW_NUMBER() OVER (ORDER BY Timestamp) AS rn
FROM Messages
), cte2 AS (
SELECT Message, Timestamp, gn, rn, gn - rn  as gb
FROM cte 
)
SELECT * FROM cte2

Message Timestamp           gn  rn  gb
A   2015-03-29 00:00:00.000 1   1   0
A   2015-03-29 00:01:00.000 2   2   0
B   2015-03-29 00:02:00.000 1   3   -2
A   2015-03-29 00:03:00.000 3   4   -1
A   2015-03-29 00:04:00.000 4   5   -1
A   2015-03-29 00:05:00.000 5   6   -1
B   2015-03-29 00:06:00.000 2   7   -5

156

answered Nov 06 '22 21:11

Mihail Shishkov

Here is a little bit smaller solution:

DECLARE @t TABLE ( d DATE, m CHAR(1) )

INSERT  INTO @t
VALUES  ( '20150301', 'A' ),
        ( '20150302', 'A' ),
        ( '20150303', 'B' ),
        ( '20150304', 'A' ),
        ( '20150305', 'A' ),
        ( '20150306', 'A' ),
        ( '20150307', 'B' );

WITH 
c1 AS(SELECT d, m, IIF(LAG(m, 1, m) OVER(ORDER BY d) = m, 0, 1) AS n FROM @t),
c2 AS(SELECT m, SUM(n) OVER(ORDER BY d) AS n FROM c1) 
    SELECT m, COUNT(*) AS c
    FROM c2
    GROUP BY m, n

Output:

m   c
A   2
B   1
A   3
B   1

The idea is to get value 1 at rows where message is changed:

2015-03-01  A   0
2015-03-02  A   0
2015-03-03  B   1
2015-03-04  A   1
2015-03-05  A   0
2015-03-06  A   0
2015-03-07  B   1

The second step is just sum of current row value + all preceding values:

2015-03-01  A   0
2015-03-02  A   0
2015-03-03  B   1
2015-03-04  A   2
2015-03-05  A   2
2015-03-06  A   2
2015-03-07  B   3

This way you get grouping sets by message column and calculated column.

answered Nov 06 '22 22:11

Giorgi Nakeuri

Related questions
                            
                                select all users where count() equals a specific value
                            
                                SQL SELECT multiple columns into one
                            
                                How can I order by a date in string format properly?
                            
                                Sum if based on value in another column
                            
                                MySQL skipping first row
                            
                                Indexing views with a CTE
                            
                                Get all employee who directly or indirectly reports to an employee, with hierarchy level no
                            
                                Querying XML data types which have xmlns node attributes
                            
                                UPDATE using subqueries - Updates more than the needed records
                            
                                SQL Distinct comma delimited list
                            
                                Convert rows to columns after counting [duplicate]
                            
                                Sum column with condition and display in row
                            
                                PLS-00201: identifier UTIL_FILE must be declared
                            
                                How to separate (split) string with comma in SQL Server stored procedure
                            
                                How do I discover the underlying query of a materialized view I created?
                            
                                Getting data from first and last row of each group
                            
                                SQL select when one condition or another are met but not both
                            
                                Error with a Symfony query : Expected Literal, got '"'
                            
                                Update column with value from another table using SQLite?
                            
                                unaccent() preventing index usage in Postgres

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With