I have a table with ID pairs that are in a transitive relation t, that is, if "A t B" AND "B t C" then "A t C". Sample:
table T1
ID1 | ID2
1 | 2
1 | 5
4 | 7
7 | 8
9 | 1
So there are two groups,
g1
: {1,2,5,9} because "1 t 2", "1 t 5" and "9 t 1"g2
: {4,7,8} because "4 t 7" and "7 t 8"and I need to produce, by "pure and standard SQL", a new table or view:
table T2
ID1 | ID2 | LABEL
1 | 2 | 1
1 | 5 | 1
4 | 7 | 2
7 | 8 | 2
9 | 1 | 1
PS-1: we can list the "transitive groups" by
SELECT DISTINCT label, id
FROM (SELECT id1 as id, * FROM T2) UNION (SELECT id2 as id, * FROM T2)
ORDER BY 1,2;
PS-2: I am using PostgreSQL 9.1, but if there are a solution with "standard SQL" I prefer.
You can do this in Postgres; you cannot do this in all databases. Here is the query:
with
recursive cte(id1, id2) as (
select id1, id2, 1 as level
from t
union all
select t.id1, cte.id2, cte.level + 1
from t join
cte
on t.id2 = cte.id1
)
select id1, id2,
dense_rank() over (order by grp) as label
from (select id1, id2,
least(min(id2) over (partition by id1), min(id1) over (partition by id2)) as grp,
level
from cte
) t
where level = 1;
With the SQL Fiddle here.
You are walking through a tree structure in order to assign the label (cycles might pose problems with this particular version by the way). In Postgres, you can do this using an explicit recursive
CTE. In SQL Server, you can do this with a CTE that is implicitly "recursive" (the key word is not used). In Oracle, you can do this with connect by
.
The recursive CTE gets all pairs that are connected to each other. The main query then assigns the minimum value of id1 and id2 to the pair, to identify all pairs that are connected to each other. The final label is produced just by assigning a sequential value to the grp
.
EDIT:
Egor makes a very good point. The above is assuming that the ids "descend" to the smaller values. The following version instead uses the highest level for each id for the grouping (which is really what is intended):
with
recursive cte(id1, id2) as (
select id1, id2, 1 as level
from t
union all
select t.id1, cte.id2, cte.level + 1
from t join
cte
on t.id2 = cte.id1
-- where not exists (select 1 from cte cte2 where cte2.id1 = t.id1 and cte2.id2 = t.id2)
)
select id1, id2,
dense_rank() over (order by topvalue) as label
from (select id1, id2,
first_value(id2) over (partition by id1 order by level desc) as topvalue,
level
from cte
) t
where level = 1;
EDIT II:
In response to Egor's second comment. This data is a little problematic with respect to the original problem. The following breaks it into two pieces:
with
recursive cte as (
select id1, id2, id2 as last, id1||','||id2 as grp, 1 as level
from t
where id2 not in (select id1 from t)
union all
select t.id1, t.id2, cte.last, cte.grp, cte.level + 1
from t join
cte
on t.id2 = cte.id1
-- where not exists (select 1 from cte cte2 where cte2.id1 = t.id1 and cte2.id2 = t.id2)
)
select *
from cte;
But, it is not clear if that is what the original wanted. It would break the original into three groups that overlap, because there are three ids in the second column that are never in the first column. The question here is about commutativity.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With