I have a weird grouping scenario and have some troubles finding out what would be the best way for grouping in SQL.
Imagine we have the following one table
CREATE TABLE Item
(
KeyId VARCHAR(1) NOT NULL,
Col1 INT NULL,
Col2 INT NULL,
Col3 INT NULL
)
GO
INSERT INTO Item (KeyId, Col1, Col2, Col3)
VALUES
('a',1,2,3),
('b',5,4,3),
('c',5,7,6),
('d',8,7,9),
('e',11,10,9),
('f',11,12,13),
('g',20,22,21),
('h',23,22,24)
I need to group records in this table so that if Col1 OR Col2 OR Col3 is the same for two records, then these two records should be in the same group, and there should be chaining. In other words, with the data as above record 'a' (first record) has Col3 = 3 and record 'b' (second record) has also Col3 = 3, so these two should be in one group. But then record 'b' has the same Col1 as record 'c', so record 'c' should be in the same group as 'a' and 'b'. And then record 'd' has the same Col2 as in 'c', so this should also be in the same group. Similarly 'e' and 'f' has the same values in Col3 and Col1 respectively.
On the other hand records 'g' and 'h' will be in one group (because they have the same Col2 = 22), but this group will be different from the group for records 'a','b','c','d','e','f'.
The result of the query should be something like
KeyId GroupId
'a' 1
'b' 1
'c' 1
'd' 1
'e' 1
'f' 1
'g' 2
'h' 2
There is probably a way of doing this with some loops/cursors, but I started thinking about cleaner way and this seems quite difficult.
Here you go:
with g (rootid, previd, level, keyid, col1, col2, col3) as (
select keyid, '-', 1, keyid, col1, col2, col3 from item
union all
select g.rootid, g.keyid, g.level + 1, i.keyid, i.col1, i.col2, i.col3
from g
join item i on i.col1 = g.col1 or i.col2 = g.col2 or i.col3 = g.col3
where i.keyid > g.keyid
),
m (keyid, rootid) as (
select keyid, min(rootid) from g group by keyid
)
select * from m;
Result:
keyid rootid
----- ------
a a
b a
c a
d a
e a
f a
g g
h g
Note: Keep in mind that SQL Server has by default a limit of 100 iterations (number of rows per group) when processing recursive CTEs. In English: even though it's possible to do this as shown above, there are clear limitations to what SQL Server can process. If you reach this limit you'll get the message:
The maximum recursion 100 has been exhausted before statement completion.
If this happens consider adding the clause option (maxrecursion 32767)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With