Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Dense Rank Sets of data

I am trying to get a dense rank to group sets of data together. In my table I have ID, GRP_SET, SUB_SET, and INTERVAL which simply represents a date field. When records are inserted using an ID they get inserted as GRP_SETs of 3 rows shown as a SUB_SET. As you can see when inserts happen the interval can change slightly before it finishes inserting the set.

Here is some example data and the DRANK column represents what ranking I'm trying to get.

with q as (
select 1 id, 'a' GRP_SET, 1 as SUB_SET, 123 as interval, 1 as DRANK from dual union all
select 1, 'a', 2, 123, 1 from dual union all
select 1, 'a', 3, 124, 1 from dual union all
select 1, 'b', 1, 234, 2 from dual union all
select 1, 'b', 2, 235, 2 from dual union all
select 1, 'b', 3, 235, 2 from dual union all
select 1, 'a', 1, 331, 3 from dual union all
select 1, 'a', 2, 331, 3 from dual union all
select 1, 'a', 3, 331, 3 from dual)

select * from q

Example Data

ID GRP_SET SUBSET INTERVAL DRANK
1  a       1      123      1
1  a       2      123      1
1  a       3      124      1
1  b       1      234      2
1  b       3      235      2
1  b       2      235      2
1  a       1      331      3
1  a       2      331      3
1  a       3      331      3

Here is the query I Have that gets close but I seem to need something like a:

  • Partition By: ID
  • Order within partition by: ID, Interval
  • Change Rank when: ID, GRP_SET (change)

select
   id, GRP_SET, SUB_SET, interval,
   DENSE_RANK() over (partition by ID order by id, GRP_SET) as DRANK_TEST
from q
Order by
   id, interval
like image 329
Cullen Burton Avatar asked Sep 29 '22 23:09

Cullen Burton


2 Answers

Using the MODEL clause

Behold for you are pushing your requirements beyond the limits of what is easy to express in "ordinary" SQL. But luckily, you're using Oracle, which features the MODEL clause, a device whose mystery is only exceeded by its power (excellent whitepaper here). You shall write:

SELECT
   id, grp_set, sub_set, interval, drank
FROM (
  SELECT id, grp_set, sub_set, interval, 1 drank
  FROM q
)
MODEL PARTITION BY (id)
      DIMENSION BY (row_number() OVER (ORDER BY interval, sub_set) rn)
      MEASURES (grp_set, sub_set, interval, drank)
      RULES (
        drank[any] = NVL(drank[cv(rn) - 1] + 
                         DECODE(grp_set[cv(rn) - 1], grp_set[cv(rn)], 0, 1), 1)
      )

Proof on SQLFiddle

Explanation:

SELECT
   id, grp_set, sub_set, interval, drank
FROM (
  -- Here, we initialise your "dense rank" to 1
  SELECT id, grp_set, sub_set, interval, 1 drank
  FROM q
)

-- Then we partition the data set by ID (that's your requirement)
MODEL PARTITION BY (id)

-- We generate row numbers for all columns ordered by interval and sub_set,
-- such that we can then access row numbers in that particular order
      DIMENSION BY (row_number() OVER (ORDER BY interval, sub_set) rn)

-- These are the columns that we want to generate from the MODEL clause
      MEASURES (grp_set, sub_set, interval, drank)

-- And the rules are simple: Each "dense rank" value is equal to the
-- previous "dense rank" value + 1, if the grp_set value has changed
      RULES (
        drank[any] = NVL(drank[cv(rn) - 1] + 
                         DECODE(grp_set[cv(rn) - 1], grp_set[cv(rn)], 0, 1), 1)
      )

Of course, this only works if there are no interleaving events, i.e. there is no other grp_set than a between 123 and 124

like image 178
Lukas Eder Avatar answered Oct 07 '22 16:10

Lukas Eder


This might work for you. The complicating factor is that you want the same "DENSE RANK" for intervals 123 and 124 and for intervals 234 and 235. So we'll truncate them to the nearest 10 for purposes of ordering the DENSE_RANK() function:

SELECT id, grp_set, sub_set, interval, drank
     , DENSE_RANK() OVER ( PARTITION BY id ORDER BY TRUNC(interval, -1), grp_set ) AS drank_test
  FROM q

Please see SQL Fiddle demo here.

If you want the intervals to be even closer together in order to be grouped together, then you can multiply the value before truncating. This would group them by 3s (but maybe you don't need them so granular):

SELECT id, grp_set, sub_set, interval, drank
     , DENSE_RANK() OVER ( PARTITION BY id ORDER BY TRUNC(interval*10/3, -1), grp_set ) AS drank_test
  FROM q
like image 25
David Faber Avatar answered Oct 07 '22 17:10

David Faber