Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dense Rank with order by

I have Assignment Table like this

EMPLID | RCD | COMPANY |   EFFDT       |  SALARY
---------------------------------------------------
100    | 0   | xyz     |   1/1/2000    |    1000
100    | 0   | xyz     |   1/15/2000   |    1100
100    | 0   | xyz     |   1/31/2000   |    1200
100    | 0   | ggg     |   2/15/2000   |    1500
100    | 1   | abc     |   3/1/2000    |    2000
100    | 1   | abc     |   4/1/2000    |    2100

I need a counter which should increase whenever RCD or Company combination changes and it should be order by effdt.

EMPLID | RCD | COMPANY |   EFFDT       |  SALARY     | COUNTER
-------|-----|---------|---------------|-------------|----------
100    | 0   | xyz     |   1/1/2000    |    1000     | 1
100    | 0   | xyz     |   1/15/2000   |    1100     | 1
100    | 0   | xyz     |   1/31/2000   |    1200     | 1
100    | 0   | ggg     |   2/15/2000   |    1500     | 2
100    | 1   | abc     |   3/1/2000    |    2000     | 3
100    | 1   | abc     |   4/1/2000    |    2100     | 3

I tried Dense_Rank function with order by EMPLID , RCD , COMPANY , It provides me Counter but its not in order by effdt.

SELECT EMPLID,RCD,COMPANY,EFFDT,
    DENSE_RANK() over (order by EMPLID , RCD , COMPANY) AS COUNTER
FROM ASSIGNMENT ;

Order by EFFDT , Gives incremental counter 1 ... 6

SELECT EMPLID,RCD,COMPANY,EFFDT,
  DENSE_RANK() over (order by EFFDT) AS COUNTER 
FROM ASSIGNMENT;

Kindly help me to find out what I am missing.

like image 282
Bhushan Avatar asked Nov 03 '16 07:11

Bhushan


People also ask

Can we use dense rank without ORDER BY?

Data does not have an order. You can't expect it to be returned in the same order unless you specify an order by -- so trying to implement a rank without order by doesn't make sense.

Can we use group by with dense rank?

That is, the dense_rank() is performed after the GROUP BY . This is totally allowed.

What is DENSE_RANK () in SQL?

DENSE_RANK computes the rank of a row in an ordered group of rows and returns the rank as a NUMBER . The ranks are consecutive integers beginning with 1. The largest rank value is the number of unique values returned by the query.

What is the difference between the rank () and the DENSE_RANK () functions?

rank and dense_rank are similar to row_number , but when there are ties, they will give the same value to the tied values. rank will keep the ranking, so the numbering may go 1, 2, 2, 4 etc, whereas dense_rank will never give any gaps.


2 Answers

Try LAG

WITH flagged AS (  
    SELECT *, 
      CASE WHEN LAG(RCD) OVER(PARTITION BY EMPLID ORDER BY EFFDT) = RCD 
              AND LAG(COMPANY) OVER(PARTITION BY EMPLID ORDER BY EFFDT) = COMPANY THEN 0 ELSE 1 END strtFlag
    FROM tbl
    )

SELECT EMPLID, RCD, COMPANY, EFFDT, SALARY, SUM(strtFlag) OVER(PARTITION BY EMPLID ORDER BY EFFDT) COUNTER
FROM flagged

alternatively, with DENSE_RANK() of group

WITH grps AS (  
    SELECT *, 
      ROW_NUMBER() OVER(PARTITION BY EMPLID ORDER BY EFFDT) -
      ROW_NUMBER() OVER(PARTITION BY EMPLID, RCD, COMPANY ORDER BY EFFDT) grp
    FROM tbl
    )

SELECT EMPLID, RCD, COMPANY, EFFDT, SALARY
  , DENSE_RANK() OVER(PARTITION BY EMPLID ORDER BY grp) COUNTER
FROM grps

Anyway looks like two steps are needed to get dense numbering.

like image 200
Serg Avatar answered Sep 22 '22 14:09

Serg


This should work - with the clarification that a combination of rcd and company should keep the same "counter" even if it appears in non-consecutive periods. I added to more rows to the test data to make sure I get the correct result.

Like Serg's solutions (which answer a different question), the solution does one pass over the base data, and then a second pass over the results of the first pass (all in memory, so it should be relatively fast). There's no way around that - this requires two different analytic functions where one depends on the results of the other, and nested analytic functions are not allowed. (This part of the answer addresses a comment by the OP to the Answer by Serg.)

with
     test_data ( emplid, rcd, company, effdt, salary ) as (
       select 100, 0, 'xyz', to_date('1/1/2000' , 'mm/dd/yyyy'), 1000 from dual union all
       select 100, 0, 'xyz', to_date('1/15/2000', 'mm/dd/yyyy'), 1100 from dual union all
       select 100, 0, 'xyz', to_date('1/31/2000', 'mm/dd/yyyy'), 1200 from dual union all
       select 100, 0, 'ggg', to_date('2/15/2000', 'mm/dd/yyyy'), 1500 from dual union all
       select 100, 1, 'abc', to_date('3/1/2000' , 'mm/dd/yyyy'), 2000 from dual union all
       select 100, 1, 'abc', to_date('4/1/2000' , 'mm/dd/yyyy'), 2100 from dual union all
       select 100, 0, 'xyz', to_date('5/1/2000' , 'mm/dd/yyyy'), 2200 from dual union all
       select 100, 1, 'ggg', to_date('8/15/2000', 'mm/dd/yyyy'), 2300 from dual
     )
-- end of test data; the actual solution (SQL query) begins below this line
select emplid, rcd, company, effdt, salary,
       dense_rank() over (partition by emplid order by min_dt) as counter
from ( select emplid, rcd, company, effdt, salary, 
              min(effdt) over (partition by emplid, rcd, company) as min_dt
       from   test_data )
order by effdt                --   ORDER BY is optional
;

    EMPLID        RCD COM EFFDT                   SALARY    COUNTER
---------- ---------- --- ------------------- ---------- ----------
       100          0 xyz 2000-01-01 00:00:00       1000          1
       100          0 xyz 2000-01-15 00:00:00       1100          1
       100          0 xyz 2000-01-31 00:00:00       1200          1
       100          0 ggg 2000-02-15 00:00:00       1500          2
       100          1 abc 2000-03-01 00:00:00       2000          3
       100          1 abc 2000-04-01 00:00:00       2100          3
       100          0 xyz 2000-05-01 00:00:00       2200          1
       100          1 ggg 2000-08-15 00:00:00       2300          4

 8 rows selected
like image 26
mathguy Avatar answered Sep 19 '22 14:09

mathguy