Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rank records based on 1 column's changing value

Q: How can I rank records based on 1 column's changing value?

I have the following data (https://pastebin.com/vdTb1JRT):

EmployeeID  Date        Onleave
ABH12345    2016-01-01  0
ABH12345    2016-01-02  0
ABH12345    2016-01-03  0
ABH12345    2016-01-04  0
ABH12345    2016-01-05  0
ABH12345    2016-01-06  0
ABH12345    2016-01-07  0
ABH12345    2016-01-08  0
ABH12345    2016-01-09  0
ABH12345    2016-01-10  1
ABH12345    2016-01-11  1
ABH12345    2016-01-12  1
ABH12345    2016-01-13  1
ABH12345    2016-01-14  0
ABH12345    2016-01-15  0
ABH12345    2016-01-16  0
ABH12345    2016-01-17  0

I would like to produce the following results:

 EmployeeID DateValidFrom    DateValidTo     OnLeave
 ABH12345   2016-01-01       2016-01-09      0
 ABH12345   2016-01-10       2016-01-13      1
 ABH12345   2016-01-14       2016-01-17      0

So I'm thinking if I can somehow create a ranked column (like shown below) that increments based on the value in the Onleave column - partitioned by the EmployeeID column.

EmployeeID  Date        Onleave    RankedCol
ABH12345    2016-01-01  0          1
ABH12345    2016-01-02  0          1
ABH12345    2016-01-03  0          1
ABH12345    2016-01-04  0          1
ABH12345    2016-01-05  0          1
ABH12345    2016-01-06  0          1
ABH12345    2016-01-07  0          1
ABH12345    2016-01-08  0          1
ABH12345    2016-01-09  0          1
ABH12345    2016-01-10  1          2
ABH12345    2016-01-11  1          2
ABH12345    2016-01-12  1          2
ABH12345    2016-01-13  1          2
ABH12345    2016-01-14  0          3
ABH12345    2016-01-15  0          3
ABH12345    2016-01-16  0          3
ABH12345    2016-01-17  0          3

Then I would be able to do the following:

SELECT
 [EmployeeID]    = [EmployeeID]
,[DateValidFrom] = MIN([Date])
,[DateValidTo]   = MAX([Date])
,[OnLeave]       = [OnLeave]
FROM table/view/cte/sub-query
GROUP BY 
 [EmployeeID]
,[OnLeave]
,[RankedCol]

Other solutions are very welcome..

Below is the test data :

WITH CTE AS ( SELECT EmployeeID = 'ABH12345', [Date] = CAST(N'2016-01-01' AS Date), [Onleave] = 0
UNION SELECT 'ABH12345', CAST(N'2016-01-02' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-03' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-04' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-05' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-06' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-07' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-08' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-09' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-10' AS Date), 1
UNION SELECT 'ABH12345', CAST(N'2016-01-11' AS Date), 1
UNION SELECT 'ABH12345', CAST(N'2016-01-12' AS Date), 1
UNION SELECT 'ABH12345', CAST(N'2016-01-13' AS Date), 1
UNION SELECT 'ABH12345', CAST(N'2016-01-14' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-15' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-16' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-17' AS Date), 0
)

SELECT * FROM CTE
like image 658
Emil Vissing Avatar asked Jun 03 '17 13:06

Emil Vissing


People also ask

How do I rank based on a column in SQL?

The RANK() function creates a ranking of the rows based on a provided column. It starts with assigning “1” to the first row in the order and then gives higher numbers to rows lower in the order. If rows have the same value, they're ranked the same.

Can rank be used with group by?

we can use rank function and group by in the same query set but all the columns should be contained in either aggregate function or the Group by clause.

How does rank over partition work?

To partition rows and rank them by their position within the partition, use the RANK() function with the PARTITION BY clause. SQL's RANK() function allows us to add a record's position within the result set or within each partition. In our example, we rank rows within a partition.

Can we use rank function in where clause?

This order of operations implies that you can only use window functions in SELECT and ORDER BY . That is, window functions are not accessible in WHERE , GROUP BY , or HAVING clauses. For this reason, you cannot use any of these functions in WHERE : ROW_NUMBER() , RANK() , DENSE_RANK() , LEAD() , LAG() , or NTILE() .


2 Answers

One more way to do it with lag. Assign groups by getting the previous Onleave value for each employeeid and resetting it when a different value is found.

select employeeid,min(date) as date_from,max(date) as date_to,max(onleave) as onleave
from (select t.*,sum(case when prev_ol=onleave then 0 else 1 end) over(partition by employeeid order by date) as grp
      from (select c.*,lag(onleave,1,onleave) over(partition by employeeid order by date) as prev_ol
            from cte c
           ) t
      ) t
group by employeeid,grp 
like image 129
Vamsi Prabhala Avatar answered Nov 14 '22 23:11

Vamsi Prabhala


Here is another, a bit simpler, way to get the desired output - accessing the table only once.

-- sample of data from your question
with t1(EmployeeID, Date1, Onleave) as(
  select 'ABH12345', cast('2016-01-01' as date),  0 union all
  select 'ABH12345', cast('2016-01-02' as date),  0 union all
  select 'ABH12345', cast('2016-01-03' as date),  0 union all
  select 'ABH12345', cast('2016-01-04' as date),  0 union all
  select 'ABH12345', cast('2016-01-05' as date),  0 union all
  select 'ABH12345', cast('2016-01-06' as date),  0 union all
  select 'ABH12345', cast('2016-01-07' as date),  0 union all
  select 'ABH12345', cast('2016-01-08' as date),  0 union all
  select 'ABH12345', cast('2016-01-09' as date),  0 union all
  select 'ABH12345', cast('2016-01-10' as date),  1 union all
  select 'ABH12345', cast('2016-01-11' as date),  1 union all
  select 'ABH12345', cast('2016-01-12' as date),  1 union all
  select 'ABH12345', cast('2016-01-13' as date),  1 union all
  select 'ABH12345', cast('2016-01-14' as date),  0 union all
  select 'ABH12345', cast('2016-01-15' as date),  0 union all
  select 'ABH12345', cast('2016-01-16' as date),  0 union all
  select 'ABH12345', cast('2016-01-17' as date),  0
)
-- actual query
select max(w.employeeid) as employeeid
     , min(w.date1)      as datevalidfrom
     , max(w.date1)      as datevalidto
     , max(w.onleave)    as onleave 
  from (
        select row_number() over(partition by employeeid order by date1) -
               row_number() over(partition by employeeid, onleave order by date1) as grp
             , employeeid
             , date1
             , onleave
          from t1 s
        ) w
group by w.grp
order by employeeid, datevalidfrom

Result:

employeeid datevalidfrom datevalidto onleave
---------- ------------- ----------- -----------
ABH12345   2016-01-01    2016-01-09  0
ABH12345   2016-01-10    2016-01-13  1
ABH12345   2016-01-14    2016-01-17  0
like image 22
Nick Krasnov Avatar answered Nov 14 '22 23:11

Nick Krasnov