I have the following table in Postgres that has overlapping data in the two columns <code>a_sno</code> and <code>b_sno</code>. <pre class="prettyprint"><code>create table data ( a_sno integer not null, b_sno integer not null, PRIMARY KEY (a_sno,b_sno) ); insert into data (a_sno,b_sno) values ( 4, 5 ) , ( 5, 4 ) , ( 5, 6 ) , ( 6, 5 ) , ( 6, 7 ) , ( 7, 6 ) , ( 9, 10) , ( 9, 13) , (10, 9 ) , (13, 9 ) , (10, 13) , (13, 10) , (10, 14) , (14, 10) , (13, 14) , (14, 13) , (11, 15) , (15, 11); </code></pre> As you can see from the first 6 rows data values 4,5,6 and 7 in the two columns intersects/overlaps that need to partitioned to a group. Same goes for rows 7-16 and rows 17-18 which will be labeled as group 2 and 3 respectively. The resulting output should look like this: <pre class="prettyprint"><code>group | value ------+------ 1 | 4 1 | 5 1 | 6 1 | 7 2 | 9 2 | 10 2 | 13 2 | 14 3 | 11 3 | 15 </code></pre>

Assuming that all pairs exists in their mirrored combination as well <code>(4,5)</code> and <code>(5,4)</code>. But the following solutions work without mirrored dupes just as well. <h3>Simple case</h3> All connections can be lined up in a single ascending sequence and complications like I added in the fiddle are not possible, we can use this solution without duplicates in the rCTE: I start by getting minimum <code>a_sno</code> per group, with the minimum associated <code>b_sno</code>: <pre class="prettyprint"><code>SELECT row_number() OVER (ORDER BY a_sno) AS grp , a_sno, min(b_sno) AS b_sno FROM data d WHERE a_sno < b_sno AND NOT EXISTS ( SELECT 1 FROM data WHERE b_sno = d.a_sno AND a_sno < b_sno ) GROUP BY a_sno; </code></pre> This only needs a single query level since a window function can be built on an aggregate: <ul> <li>Get the distinct sum of a joined table column</li> </ul> Result: <pre class="prettyprint"><code>grp a_sno b_sno 1 4 5 2 9 10 3 11 15 </code></pre> I avoid branches and duplicated (multiplicated) rows - potentially much more expensive with long chains. I use <code>ORDER BY b_sno LIMIT 1</code> in a correlated subquery to make this fly in a recursive CTE. <ul> <li>Create a unique index on a non-unique column</li> </ul> Key to performance is a matching index, which is already present provided by the PK constraint <code>PRIMARY KEY (a_sno,b_sno)</code>: not the other way round <strike><code>(b_sno, a_sno)</code></strike>: <ul> <li>Is a composite index also good for queries on the first field?</li> </ul> <pre class="prettyprint"><code>WITH RECURSIVE t AS ( SELECT row_number() OVER (ORDER BY d.a_sno) AS grp , a_sno, min(b_sno) AS b_sno -- the smallest one FROM data d WHERE a_sno < b_sno AND NOT EXISTS ( SELECT 1 FROM data WHERE b_sno = d.a_sno AND a_sno < b_sno ) GROUP BY a_sno ) , cte AS ( SELECT grp, b_sno AS sno FROM t UNION ALL SELECT c.grp , (SELECT b_sno -- correlated subquery FROM data WHERE a_sno = c.sno AND a_sno < b_sno ORDER BY b_sno LIMIT 1) FROM cte c WHERE c.sno IS NOT NULL ) SELECT * FROM cte WHERE sno IS NOT NULL -- eliminate row with NULL UNION ALL -- no duplicates SELECT grp, a_sno FROM t ORDER BY grp, sno; </code></pre> <h3>Less simple case</h3> All nodes can be reached in ascending order with one or more branches from the root (smallest <code>sno</code>). This time, get all greater <code>sno</code> and de-duplicate nodes that may be visited multiple times with <code>UNION</code> at the end: <pre class="prettyprint"><code>WITH RECURSIVE t AS ( SELECT rank() OVER (ORDER BY d.a_sno) AS grp , a_sno, b_sno -- get all rows for smallest a_sno FROM data d WHERE a_sno < b_sno AND NOT EXISTS ( SELECT 1 FROM data WHERE b_sno = d.a_sno AND a_sno < b_sno ) ) , cte AS ( SELECT grp, b_sno AS sno FROM t UNION ALL SELECT c.grp, d.b_sno FROM cte c JOIN data d ON d.a_sno = c.sno AND d.a_sno < d.b_sno -- join to all connected rows ) SELECT grp, sno FROM cte UNION -- eliminate duplicates SELECT grp, a_sno FROM t -- add first rows ORDER BY grp, sno; </code></pre> Unlike the first solution, we don't get a last row with NULL here (caused by the correlated subquery). Both should perform very well - especially with long chains / many branches. Result as desired: SQL Fiddle (with added rows to demonstrate difficulty). <h3>Undirected graph</h3> If there are local minima that cannot be reached from the root with ascending traversal, the above solutions won't work. Consider Farhęg's solution in this case.

SQL grouping interescting/overlapping rows

Tags:

sql

postgresql

recursive-query

common-table-expression

I have the following table in Postgres that has overlapping data in the two columns a_sno and b_sno.

Click to copy

create table data
( a_sno integer not null,  
  b_sno integer not null,
  PRIMARY KEY (a_sno,b_sno)
);

insert into data (a_sno,b_sno) values
  ( 4, 5 )
, ( 5, 4 )
, ( 5, 6 )
, ( 6, 5 )
, ( 6, 7 )
, ( 7, 6 )
, ( 9, 10)
, ( 9, 13)
, (10, 9 )
, (13, 9 )
, (10, 13)
, (13, 10)
, (10, 14)
, (14, 10)
, (13, 14)
, (14, 13)
, (11, 15)
, (15, 11);

As you can see from the first 6 rows data values 4,5,6 and 7 in the two columns intersects/overlaps that need to partitioned to a group. Same goes for rows 7-16 and rows 17-18 which will be labeled as group 2 and 3 respectively.

The resulting output should look like this:

Click to copy

group | value
------+------
1     | 4
1     | 5
1     | 6
1     | 7
2     | 9
2     | 10
2     | 13
2     | 14
3     | 11
3     | 15

488

asked Apr 19 '15 19:04

bogeyman

1 Answers

Assuming that all pairs exists in their mirrored combination as well (4,5) and (5,4). But the following solutions work without mirrored dupes just as well.

Simple case

All connections can be lined up in a single ascending sequence and complications like I added in the fiddle are not possible, we can use this solution without duplicates in the rCTE:

I start by getting minimum a_sno per group, with the minimum associated b_sno:

Click to copy

SELECT row_number() OVER (ORDER BY a_sno) AS grp
     , a_sno, min(b_sno) AS b_sno
FROM   data d
WHERE  a_sno < b_sno
AND    NOT EXISTS (
   SELECT 1 FROM data
   WHERE  b_sno = d.a_sno
   AND    a_sno < b_sno
   )
GROUP  BY a_sno;

This only needs a single query level since a window function can be built on an aggregate:

Get the distinct sum of a joined table column

Result:

Click to copy

grp  a_sno  b_sno
1    4      5
2    9      10
3    11     15

I avoid branches and duplicated (multiplicated) rows - potentially much more expensive with long chains. I use ORDER BY b_sno LIMIT 1 in a correlated subquery to make this fly in a recursive CTE.

Create a unique index on a non-unique column

Key to performance is a matching index, which is already present provided by the PK constraint PRIMARY KEY (a_sno,b_sno): not the other way round ~~(b_sno, a_sno)~~:

Is a composite index also good for queries on the first field?

Click to copy

WITH RECURSIVE t AS (
   SELECT row_number() OVER (ORDER BY d.a_sno) AS grp
        , a_sno, min(b_sno) AS b_sno  -- the smallest one
   FROM   data d
   WHERE  a_sno < b_sno
   AND    NOT EXISTS (
      SELECT 1 FROM data
      WHERE  b_sno = d.a_sno
      AND    a_sno < b_sno
      )
   GROUP  BY a_sno
   )

, cte AS (
   SELECT grp, b_sno AS sno FROM t

   UNION ALL
   SELECT c.grp
       , (SELECT b_sno  -- correlated subquery
          FROM   data
          WHERE  a_sno = c.sno
          AND    a_sno < b_sno
          ORDER  BY b_sno
          LIMIT  1)
   FROM   cte  c
   WHERE  c.sno IS NOT NULL
   )
SELECT * FROM cte
WHERE  sno IS NOT NULL   -- eliminate row with NULL
UNION  ALL               -- no duplicates
SELECT grp, a_sno FROM t
ORDER  BY grp, sno;

Less simple case

All nodes can be reached in ascending order with one or more branches from the root (smallest sno).

This time, get all greater sno and de-duplicate nodes that may be visited multiple times with UNION at the end:

Click to copy

WITH RECURSIVE t AS (
   SELECT rank() OVER (ORDER BY d.a_sno) AS grp
        , a_sno, b_sno  -- get all rows for smallest a_sno
   FROM   data d
   WHERE  a_sno < b_sno
   AND    NOT EXISTS (
      SELECT 1 FROM data
      WHERE  b_sno = d.a_sno
      AND    a_sno < b_sno
      )
   )   
, cte AS (
   SELECT grp, b_sno AS sno FROM t

   UNION ALL
   SELECT c.grp, d.b_sno
   FROM   cte  c
   JOIN   data d ON d.a_sno = c.sno
                AND d.a_sno < d.b_sno  -- join to all connected rows
   )
SELECT grp, sno FROM cte
UNION                     -- eliminate duplicates
SELECT grp, a_sno FROM t  -- add first rows
ORDER  BY grp, sno;

Unlike the first solution, we don't get a last row with NULL here (caused by the correlated subquery).

Both should perform very well - especially with long chains / many branches. Result as desired:

SQL Fiddle (with added rows to demonstrate difficulty).

Undirected graph

If there are local minima that cannot be reached from the root with ascending traversal, the above solutions won't work. Consider Farhęg's solution in this case.

answered Sep 21 '22 20:09

Erwin Brandstetter

Related questions
                            
                                Inserting a row at the specific place in SQLite database
                            
                                The parentheses rules of PostgreSQL, is there a summarized guide?
                            
                                Matching a String having '%' as a character
                            
                                How do I replace NOT EXISTS with JOIN?
                            
                                How to secure User ID inside cookie value
                            
                                "multiple to-many keys not allowed here" issue with a predicate
                            
                                SQL: "NOT IN (SUBQUERY)" is not working as expected
                            
                                Insert list of values into a SQL Server table with stored procedure
                            
                                SQL like search substring starts with
                            
                                LEFT JOIN gives different data set depending on the position of WHERE condition
                            
                                SQL How to select Max 5 values from table?
                            
                                sql select average from distinct column of table
                            
                                Difference between database name and schema name in SQLAlchemy?
                            
                                Incorrect Syntax Near SET. Expecting EXTERNAL
                            
                                ActiveRecord: find_by_sql and includes
                            
                                Cannot perform a backup or restore operation within a transaction
                            
                                SQL: Inner Join return one row based on criteria
                            
                                Specify alias on join clause using QueryDSL
                            
                                How to drop a table in SQL Server 2008 only if exists
                            
                                Returning the value of identity column after insertion in Oracle

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SQL grouping interescting/overlapping rows

Tags:

sql

postgresql

recursive-query

common-table-expression

bogeyman

People also ask

1 Answers

Simple case

Less simple case

Undirected graph

Erwin Brandstetter

Recent Activity

Donate For Us