I have a result set that for simplicity I will refer to as table "tab" of three columns: Category, Subcategory and Date, ordered by category and then date. This dataset is a grid and I wish to perform other processing on top of that grid. My problem is uniquely identifying (or sequentially labelling) groups within the dataset. The SQL below is what I am after (either GID1 or GID2 would do), based on the existence of the first 3 columns. I have attempted group_id, grouping_id, rank, dense_rank and either missed a trick with one of these or am attempting something very awkward. It is not important what order the GID is, but is important that the group-number assignment is based on the data as ordered (category then date).
CREATE TABLE Tab
("Category" varchar2(1), "SubCategory" varchar2(7), "Date" int, "GID1" int, "GID2" int);
INSERT ALL
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('A', 'bannana', 20120101, NULL, NULL)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('A', 'grape', 20120102, NULL, NULL)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('A', 'pear', 20120103, 1, 1)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('A', 'pear', 20120104, 1, 1)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('A', 'bannana', 20120105, NULL, NULL)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('A', 'pear', 20120106, 2, 2)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('A', 'pear', 20120107, 2, 2)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('A', 'apple', 20120108, NULL, NULL)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('A', 'pear', 20120109, 3, 3)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('B', 'apple', 20120101, NULL, NULL)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('B', 'bannana', 20120102, NULL, NULL)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('B', 'apple', 20120103, NULL, NULL)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('B', 'bannana', 20120104, NULL, NULL)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('B', 'pear', 20120105, 1, 4)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('B', 'pear', 20120106, 1, 4)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('B', 'pear', 20120107, 1, 4)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('B', 'pear', 20120108, 1, 4)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('B', 'pear', 20120109, 1, 4)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('C', 'grape', 20120101, NULL, NULL)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('C', 'grape', 20120102, NULL, NULL)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('C', 'apple', 20120103, NULL, NULL)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('C', 'bannana', 20120104, NULL, NULL)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('C', 'grape', 20120105, NULL, NULL)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('C', 'pear', 20120106, 1, 5)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('C', 'apple', 20120107, NULL, NULL)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('C', 'apple', 20120108, NULL, NULL)
INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
VALUES ('C', 'apple', 20120109, NULL, NULL)
SELECT * FROM dual
;
ok if its just pears then:
SQL> select "Category", "SubCategory", "Date",
2 case
3 when "SubCategory" = 'pear'
4 then
5 count(rn) over (partition by "Category" order by "Date") else null
6 end GID1 ,
7 case
8 when "SubCategory" = 'pear'
9 then
10 count(rn) over ( order by "Category", "Date") else null
11 end GID2
12 from (select "Category", "SubCategory", "Date", lag("SubCategory") over (partition by "Category" order by "Date"),
13 case
14 when lag("SubCategory") over (partition by "Category" order by "Date") != "SubCategory"
15 and "SubCategory" = 'pear'
16 then 1
17 when row_number() over (partition by "Category" order by "Date") = 1 and "SubCategory" = 'pear' then 1
18 else null
19 end rn
20 from tab)
21 order by 1, 3;
Category SubCate Date GID1 GID2
---------- ------- ---------- ---------- ----------
A bannana 20120101
A grape 20120102
A pear 20120103 1 1
A pear 20120104 1 1
A bannana 20120105
A pear 20120106 2 2
A pear 20120107 2 2
A apple 20120108
A pear 20120109 3 3
B apple 20120101
B bannana 20120102
B apple 20120103
B bannana 20120104
B pear 20120105 1 4
B pear 20120106 1 4
B pear 20120107 1 4
B pear 20120108 1 4
B pear 20120109 1 4
C grape 20120101
C grape 20120102
C apple 20120103
C bannana 20120104
C grape 20120105
C pear 20120106 1 5
C apple 20120107
C apple 20120108
C apple 20120109
to break this down.
we look at the prior row ordered by "Date" (for each "Category") and see if it was a different "SubCategory" and also that the current cateogry = pear. if so we tag the row with a "1" (irrelevant what we use, just NON NULL).
lag("SubCategory") over (partition by "Category" order by "Date") != "SubCategory"
and "SubCategory" = 'pear'
also for the first row we assign the same. this give us:
Category SubCate Date LAG("SU RN
---------- ------- ---------- ------- ----------
A bannana 20120101
A grape 20120102 bannana
A pear 20120103 grape 1
A pear 20120104 pear
A bannana 20120105 pear
A pear 20120106 bannana 1
A pear 20120107 pear
A apple 20120108 pear
A pear 20120109 apple 1
B apple 20120101
B bannana 20120102 apple
B apple 20120103 bannana
B bannana 20120104 apple
B pear 20120105 bannana 1
B pear 20120106 pear
B pear 20120107 pear
B pear 20120108 pear
B pear 20120109 pear
C grape 20120101
C grape 20120102 grape
C apple 20120103 grape
C bannana 20120104 apple
C grape 20120105 bannana
C pear 20120106 grape 1
C apple 20120107 pear
C apple 20120108 apple
C apple 20120109 apple
now, simply we count() the not null "RN" values ordering again on Date (per category for GID1, and not for GID2[gid2 we order by it too!). which is these lines:
count(rn) over (partition by "Category" order by "Date") (GID1)
and
count(rn) over ( order by "Category", "Date") (GID2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With