Group data by the change of grouping column value in order

Tags:

With the following data

create table #ph (product int, [date] date, price int)
insert into #ph select 1, '20120101', 1
insert into #ph select 1, '20120102', 1
insert into #ph select 1, '20120103', 1
insert into #ph select 1, '20120104', 1
insert into #ph select 1, '20120105', 2
insert into #ph select 1, '20120106', 2
insert into #ph select 1, '20120107', 2
insert into #ph select 1, '20120108', 2
insert into #ph select 1, '20120109', 1
insert into #ph select 1, '20120110', 1
insert into #ph select 1, '20120111', 1
insert into #ph select 1, '20120112', 1

I would like to produce the following output:

product | date_from | date_to  | price
  1     | 20120101  | 20120105 |   1
  1     | 20120105  | 20120109 |   2
  1     | 20120109  | 20120112 |   1

If I group by price and show the max and min date then I will get the following which is not what I want (see the over lapping of dates).

product | date_from | date_to  | price
  1     | 20120101  | 20120112 |   1
  1     | 20120105  | 20120108 |   2

So essentially what I'm looking to do is group by the step change in data based on group columns product and price.

What is the cleanest way to achieve this?

442

asked Apr 11 '12 16:04

2 Answers

There's a (more or less) known technique of solving this kind of problem, involving two ROW_NUMBER() calls, like this:

WITH marked AS (
  SELECT
    *,
    grp = ROW_NUMBER() OVER (PARTITION BY product        ORDER BY date)
        - ROW_NUMBER() OVER (PARTITION BY product, price ORDER BY date)
  FROM #ph
)
SELECT
  product,
  date_from = MIN(date),
  date_to   = MAX(date),
  price
FROM marked
GROUP BY
  product,
  price,
  grp
ORDER BY
  product,
  MIN(date)

Output:

product  date_from   date_to        price 
-------  ----------  -------------  ----- 
1        2012-01-01  2012-01-04     1     
1        2012-01-05  2012-01-08     2     
1        2012-01-09  2012-01-12     1

answered Sep 20 '22 07:09

Andriy M

I'm new to this forum so hope my contribution is helpful.

If you really don't want to use a CTE (although I think thats probably the best approach) you can get a solution using set based code. You will need to test the performance of this code!.

I have added in an extra temp table so that I can use a unique identifier for each record but I suspect you will already have this column in you source table. So heres the temp table.

    If Exists (SELECT Name FROM tempdb.sys.tables WHERE name LIKE '#phwithId%')
        DROP TABLE #phwithId    

    CREATE TABLE #phwithId
    (
        SaleId INT
        , ProductID INT
        , Price Money
        , SaleDate Date 
    )
    INSERT INTO #phwithId SELECT row_number() over(partition by product order by [date] asc) as SalesId, Product, Price, Date FROM ph

Now the main body of the Select statement

    SELECT 
        productId 
        , date_from
        , date_to
        , Price
    FROM
        (   
            SELECT 
                dfr.ProductId
                , ROW_NUMBER() OVER (PARTITION BY ProductId ORDER BY ChangeDate) AS rowno1          
                , ChangeDate AS date_from
                , dfr.Price
            FROM
                (       
                    SELECT
                        sl1.ProductId AS ProductId
                        , sl1.SaleDate AS ChangeDate
                        , sl1.price
                    FROM
                        #phwithId sl1
                    LEFT JOIN
                        #phwithId sl2
                        ON sl1.SaleId = sl2.SaleId + 1
                    WHERE
                        sl1.Price <> sl2.Price OR sl2.Price IS NULL
                ) dfr
        ) da1
    LEFT JOIN
        (   
            SELECT 
                ROW_NUMBER() OVER (PARTITION BY ProductId ORDER BY ChangeDate) AS rowno2
                , ChangeDate AS date_to     
            FROM
                (   
                    SELECT 
                        sl1.ProductId
                        , sl1.SaleDate AS ChangeDate
                    FROM
                        #phwithId sl1
                    LEFT JOIN
                        #phwithId sl3
                        ON sl1.SaleId = sl3.SaleId - 1  
                    WHERE
                        sl1.Price <> sl3.Price OR sl3.Price IS NULL         
                ) dto

        ) da2 
        ON da1.rowno1 = da2.rowno2

By binding the data source offset by 1 record (+or-) we can identify when the price buckets change and then its just a matter of getting the start and end dates for the buckets back into a single record.

All a bit fiddly and I'm not sure its going to give better performance but I enjoyed the challenge.

answered Sep 24 '22 07:09

redsevi

Related questions
                            
                                android, how to exec a sql file in sqlitedatabase
                            
                                When to add an index on a SQL table field (MySQL)?
                            
                                MYSQL - Select only if row in LEFT JOIN is not present
                            
                                Propel: Get Raw SQL from Query object?
                            
                                Hibernate SQL Query result Mapping/Convert TO Object/Class/Bean
                            
                                Symfony2 execute SQL file in Doctrine Fixtures Load
                            
                                How to Write Optimal SQL Queries
                            
                                TSQL SELECT previous date's records
                            
                                Postgresql generate_series of months
                            
                                Creating or simulating two dimensional arrays in PL/SQL
                            
                                Using Dapper to map more than 5 types
                            
                                Best way to use PostgreSQL full text search ranking
                            
                                Converting java.sql.Date & java.util.Date to org.joda.time.LocalDate
                            
                                Change value from 1 To Yes In MySQL select statement
                            
                                Flattening a relation with an array to emit one row per array entry
                            
                                How to set bool value in SQL
                            
                                How to prevent duplicate records being inserted with SqlBulkCopy when there is no primary key
                            
                                Parser for Oracle SQL
                            
                                Oracle: Updating a table column using ROWNUM in conjunction with ORDER BY clause
                            
                                How to Retrieve the Primary Key When Saving a New Object in Anorm

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Group data by the change of grouping column value in order

Tags:

sql

tsql

sql-server-2008

gaps-and-islands

MrEdmundo

People also ask

2 Answers

Andriy M

redsevi

Recent Activity

Donate For Us