Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle SQL sequential group number assignment

I have a result set that for simplicity I will refer to as table "tab" of three columns: Category, Subcategory and Date, ordered by category and then date. This dataset is a grid and I wish to perform other processing on top of that grid. My problem is uniquely identifying (or sequentially labelling) groups within the dataset. The SQL below is what I am after (either GID1 or GID2 would do), based on the existence of the first 3 columns. I have attempted group_id, grouping_id, rank, dense_rank and either missed a trick with one of these or am attempting something very awkward. It is not important what order the GID is, but is important that the group-number assignment is based on the data as ordered (category then date).

 CREATE TABLE Tab
        ("Category" varchar2(1), "SubCategory" varchar2(7), "Date" int, "GID1" int, "GID2" int);

    INSERT ALL 
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'bannana', 20120101, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'grape', 20120102, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'pear', 20120103, 1, 1)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'pear', 20120104, 1, 1)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'bannana', 20120105, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'pear', 20120106, 2, 2)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'pear', 20120107, 2, 2)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'apple', 20120108, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'pear', 20120109, 3, 3)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'apple', 20120101, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'bannana', 20120102, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'apple', 20120103, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'bannana', 20120104, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'pear', 20120105, 1, 4)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'pear', 20120106, 1, 4)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'pear', 20120107, 1, 4)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'pear', 20120108, 1, 4)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'pear', 20120109, 1, 4)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'grape', 20120101, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'grape', 20120102, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'apple', 20120103, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'bannana', 20120104, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'grape', 20120105, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'pear', 20120106, 1, 5)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'apple', 20120107, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'apple', 20120108, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'apple', 20120109, NULL, NULL)
    SELECT * FROM dual
    ;
like image 260
CupOfTea Avatar asked Mar 05 '26 22:03

CupOfTea


1 Answers

ok if its just pears then:

SQL> select "Category", "SubCategory", "Date",
  2         case
  3           when "SubCategory" = 'pear'
  4           then
  5             count(rn) over (partition by "Category" order by "Date") else null
  6         end GID1 ,
  7         case
  8           when "SubCategory" = 'pear'
  9           then
 10             count(rn) over ( order by "Category", "Date") else null
 11         end GID2
 12    from (select "Category", "SubCategory", "Date", lag("SubCategory") over (partition by "Category" order by "Date"),
 13                                    case
 14                                      when lag("SubCategory") over (partition by "Category" order by "Date") != "SubCategory"
 15                                      and "SubCategory" = 'pear'
 16                                       then 1
 17                                      when row_number() over (partition by "Category" order by "Date") = 1 and "SubCategory" = 'pear' then 1
 18                                      else null
 19                                    end rn
 20                               from tab)
 21   order by 1, 3;

Category   SubCate       Date       GID1       GID2
---------- ------- ---------- ---------- ----------
A          bannana   20120101
A          grape     20120102
A          pear      20120103          1          1
A          pear      20120104          1          1
A          bannana   20120105
A          pear      20120106          2          2
A          pear      20120107          2          2
A          apple     20120108
A          pear      20120109          3          3
B          apple     20120101
B          bannana   20120102
B          apple     20120103
B          bannana   20120104
B          pear      20120105          1          4
B          pear      20120106          1          4
B          pear      20120107          1          4
B          pear      20120108          1          4
B          pear      20120109          1          4
C          grape     20120101
C          grape     20120102
C          apple     20120103
C          bannana   20120104
C          grape     20120105
C          pear      20120106          1          5
C          apple     20120107
C          apple     20120108
C          apple     20120109

to break this down.

we look at the prior row ordered by "Date" (for each "Category") and see if it was a different "SubCategory" and also that the current cateogry = pear. if so we tag the row with a "1" (irrelevant what we use, just NON NULL).

lag("SubCategory") over (partition by "Category" order by "Date") != "SubCategory" 
 and "SubCategory" = 'pear'

also for the first row we assign the same. this give us:

Category   SubCate       Date LAG("SU         RN
---------- ------- ---------- ------- ----------
A          bannana   20120101
A          grape     20120102 bannana
A          pear      20120103 grape            1
A          pear      20120104 pear
A          bannana   20120105 pear
A          pear      20120106 bannana          1
A          pear      20120107 pear
A          apple     20120108 pear
A          pear      20120109 apple            1
B          apple     20120101
B          bannana   20120102 apple
B          apple     20120103 bannana
B          bannana   20120104 apple
B          pear      20120105 bannana          1
B          pear      20120106 pear
B          pear      20120107 pear
B          pear      20120108 pear
B          pear      20120109 pear
C          grape     20120101
C          grape     20120102 grape
C          apple     20120103 grape
C          bannana   20120104 apple
C          grape     20120105 bannana
C          pear      20120106 grape            1
C          apple     20120107 pear
C          apple     20120108 apple
C          apple     20120109 apple

now, simply we count() the not null "RN" values ordering again on Date (per category for GID1, and not for GID2[gid2 we order by it too!). which is these lines: count(rn) over (partition by "Category" order by "Date") (GID1)

and count(rn) over ( order by "Category", "Date") (GID2)

like image 140
DazzaL Avatar answered Mar 08 '26 10:03

DazzaL



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!