Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lead window function with duplicate dates

Sample data here:
http://rextester.com/VNGMF66717

I have the following data:

  ID    Year    Date    
1111    2016    2016-02-28
1111    2016    2016-02-28
1111    2016    2016-03-31
1111    2016    2016-03-31
1111    2016    2016-03-31
1111    2016    2016-04-02
1111    2016    2016-05-31
1111    2016    2016-08-01
1111    2016    2016-12-11
1111    2017    2017-01-02
1111    2017    2017-01-02
1111    2017    2017-02-04
1111    2017    2017-02-04
1111    2017    2017-07-08
2222    2016    2016-02-11
2222    2016    2016-02-11
2222    2016    2016-03-28
2222    2016    2016-03-28
2222    2016    2016-03-28
2222    2016    2016-07-22
2222    2016    2016-12-31
2222    2017    2017-02-01
2222    2017    2017-02-14

I want to add a NextDate column (that resets within each ID and year) using a lead window-function directly in the SELECT statement, instead of a self join using a RANK windowfunction as i did in the provided example.

The NextDate column should be exactly like this

  ID    Year    Date        NextDate
1111    2016    2016-02-28  2016-03-31
1111    2016    2016-02-28  2016-03-31
1111    2016    2016-03-31  2016-04-02
1111    2016    2016-03-31  2016-04-02
1111    2016    2016-03-31  2016-04-02
1111    2016    2016-04-02  2016-05-31
1111    2016    2016-05-31  2016-08-01
1111    2016    2016-08-01  2016-12-11
1111    2016    2016-12-11  NULL
1111    2017    2017-01-02  2017-02-04
1111    2017    2017-01-02  2017-02-04
1111    2017    2017-02-04  2017-07-08
1111    2017    2017-02-04  2017-07-08
1111    2017    2017-07-08  NULL
2222    2016    2016-02-11  2016-03-28
2222    2016    2016-02-11  2016-03-28
2222    2016    2016-03-28  2016-07-22
2222    2016    2016-03-28  2016-07-22
2222    2016    2016-03-28  2016-07-22
2222    2016    2016-07-22  2016-12-31
2222    2016    2016-12-31  NULL
2222    2017    2017-02-01  2017-02-14
2222    2017    2017-02-14  NULL

Anyone know how to do this properly?

like image 415
Emil Vissing Avatar asked Feb 14 '17 11:02

Emil Vissing


People also ask

What is LAG () in SQL?

In SQL Server (Transact-SQL), the LAG function is an analytic function that lets you query more than one row in a table at a time without having to join the table to itself. It returns values from a previous row in the table. To return a value from the next row, try using the LEAD function.

Can we use lead function without over clause?

Just like LAG() , LEAD() is a window function and requires an OVER clause. And as with LAG() , LEAD() must be accompanied by an ORDER BY in the OVER clause.

What is the difference between lead and lag functions?

The LEAD function is used to access data from SUBSEQUENT rows along with data from the current row. The LAG function is used to access data from PREVIOUS rows along with data from the current row. An ORDER BY clause is required when working with LEAD and LAG functions, but a PARTITION BY clause is optional.

What is lag window function?

The LAG window function returns the values for a row at a given offset above (before) the current row in the partition.


2 Answers

This may not be exactly what you meant, but by using two window functions with a nested select : MAX(LEAD), you get the desired result.

select
    id, yr, dt
    ,MAX(nx_dt) OVER (PARTITION BY id, dt) nxt_dt
FROM
    (
        select 
            *
            , LEAD(dt) OVER (PARTITION BY id, yr ORDER BY id, yr, dt) nx_dt
        from
        #testtable
    ) sub

/*
drop table #testtable

create table #testtable
(
    id int, yr int, dt date
)
insert into #testtable
(
    id, yr, dt
)
SELECT 1111,2016,'2016-02-28'   union all
SELECT 1111,2016,'2016-02-28'   union all
SELECT 1111,2016,'2016-03-31'   union all
SELECT 1111,2016,'2016-03-31'   union all
SELECT 1111,2016,'2016-03-31'   union all
SELECT 1111,2016,'2016-04-02'   union all
SELECT 1111,2016,'2016-05-31'   union all
SELECT 1111,2016,'2016-08-01'   union all
SELECT 1111,2016,'2016-12-11'   union all
SELECT 1111,2017,'2017-01-02'   union all
SELECT 1111,2017,'2017-01-02'   union all
SELECT 1111,2017,'2017-02-04'   union all
SELECT 1111,2017,'2017-02-04'   union all
SELECT 1111,2017,'2017-07-08'   union all
SELECT 2222,2016,'2016-02-11'   union all
SELECT 2222,2016,'2016-02-11'   union all
SELECT 2222,2016,'2016-03-28'   union all
SELECT 2222,2016,'2016-03-28'   union all
SELECT 2222,2016,'2016-03-28'   union all
SELECT 2222,2016,'2016-07-22'   union all
SELECT 2222,2016,'2016-12-31'   union all
SELECT 2222,2017,'2017-02-01'   union all
SELECT 2222,2017,'2017-02-14'
*/
like image 90
dunean Avatar answered Sep 21 '22 12:09

dunean


I this you need to do this with outer apply or a subquery:

select cte.*, next_cte.date
from cte outer apply
     (select top 1 cte2.*
      from cte cte2
      where cte2.id = cte.id and cte2.year = cte.year and
            cte2.date > cte.date
      order by cte2.date desc
     ) next_cte;

Another alternative is to do the lead() on distinct values:

select cte.*, cte_next.next_date
from cte join
     (select id, year, date,
             lead(date) over (partition by id, year order by date) as next_date
      from cte
      group by id, year, date
     ) cte_next
     ON cte.id = cte_next.id and cte.year = cte_next.year and cte.date = cte_next.date;
like image 28
Gordon Linoff Avatar answered Sep 20 '22 12:09

Gordon Linoff