Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL combine multiple identifiers to create a group id for duplicate records

I'm working on a problem in Oracle that I'm struggling to solve 'elegantly'.

I have a data extract with three different identifiers: A, B, C

Each identifier may appear in more than one row, and each row may have one or more of these three identifiers (i.e the column is populated or null).

I want to be able to group all records that have any combination of either A, B or C in common and assign them the same group id.

Extract table showing what the eventual groups should be:

Rownum | A    | B    | C    | End group
1        p      NULL   NULL   1
2        p      r      NULL   1
3        q      NULL   NULL   2
4        NULL   r      NULL   1
5        NULL   NULL   s      2
6        q      NULL   s      2

My original approach was to assign a guid to each row in the extract and create a lookup table for the three identifiers:

GUID | IDENTIFIER | IDENTIFIER TYPE | GROUP | END GROUP
1      p            A                 1       1
2      p            A                 1       1
2      r            B                 2       1
3      q            A                 3       3
4      r            B                 2       1
5      s            C                 4       3
6      q            A                 3       3
6      s            C                 4       3

Then group by identifier and assign a group number. The groups, however, need to be combined where possible to provide the view shown in end group.

The only solution I can think of for this problem is to use loops, which I'd rather avoid.

Any ideas would be greatly appreciated.

Niall

like image 804
eageranalyst Avatar asked Dec 18 '25 08:12

eageranalyst


1 Answers

This is truly an interesting problem. Still, I think we are missing a definition of a "group". Since in your example (p,null,null) (row1) and (null,r,null) (row4) share no common identifier and belong to the same group I'll go with this definition for grouping:

A row belongs to a group if it shares at least one identifier with at least one row of this group.

This means we can "chain" rows. This naturally leads to a hierarchical solution:

SQL> SELECT ID, a, b, c, MIN(grp) grp
  2    FROM (SELECT connect_by_root(id) ID,
  3                 connect_by_root(a) a,
  4                 connect_by_root(b) b,
  5                 connect_by_root(c) c,
  6                 ID grp
  7             FROM a
  8           CONNECT BY NOCYCLE(PRIOR a = a
  9                           OR PRIOR b = b
 10                           OR PRIOR c = c))
 11   GROUP BY ID, a, b, c
 12   ORDER BY ID;

        ID A          B          C                 GRP
---------- ---------- ---------- ---------- ----------
         1 p                                         1
         2 p          r                              1
         3 q                                         3
         4            r                              1
         5                       s                   3
         6 q                     s                   3

6 rows selected

You can execute the subquery to understand the construction:

SQL> SELECT connect_by_root(id) ID,
  2         connect_by_root(a) a,
  3         connect_by_root(b) b,
  4         connect_by_root(c) c,
  5         substr(sys_connect_by_path(ID, '->'), 3) path,
  6         ID grp
  7    FROM a
  8  CONNECT BY NOCYCLE(a = PRIOR a
  9                  OR b = PRIOR b
 10                  OR c = PRIOR c);

        ID A          B          C          PATH            GRP
---------- ---------- ---------- ---------- -------- ----------
         1 p                                1                 1
         1 p                                1->2              2
         1 p                                1->2->4           4
         2 p          r                     2                 2
         2 p          r                     2->1              1
         2 p          r                     2->4              4
         3 q                                3                 3
         3 q                                3->6              6
         3 q                                3->6->5           5
         4            r                     4                 4
         4            r                     4->2              2
         4            r                     4->2->1           1
         5                       s          5                 5
         5                       s          5->6              6
         5                       s          5->6->3           3
         6 q                     s          6                 6
         6 q                     s          6->3              3
         6 q                     s          6->5              5

18 rows selected
like image 184
Vincent Malgrat Avatar answered Dec 21 '25 05:12

Vincent Malgrat



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!