Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Left join with nearest value without duplicates

I want to achieve in MS SQL something like below, using 2 tables and through join instead of iteration.

From table A, I want each row to identify from table B which in the list is their nearest value, and when value has been selected, that value cannot re-used. Please help if you've done something like this before. Thank you in advance! #SOreadyToAsk

enter image description here

like image 796
ken lacoste Avatar asked Mar 14 '23 21:03

ken lacoste


1 Answers

Below is a set-based solution using CTEs and windowing functions.

The ranked_matches CTE assigns a closest match rank for each row in TableA along with a closest match rank for each row in TableB, using the index value as a tie breaker.

The best_matches CTE returns rows from ranked_matches that have the best rank (rank value 1) for both rankings.

Finally, the outer query uses a LEFT JOIN from TableA to the to the best_matches CTE to include the TableA rows that were not assigned a best match due to the closes match being already assigned.

Note that this does not return a match for the index 3 TableA row indicated in your sample results. The closes match for this row is TableB index 3, a difference of 83. However, that TableB row is a closer match to the TableA index 2 row, a difference of 14 so it was already assigned. Please clarify you question if this isn't what you want. I think this technique can be tweaked accordingly.

CREATE TABLE dbo.TableA(
      [index] int NOT NULL
        CONSTRAINT PK_TableA PRIMARY KEY
    , value int
    );
CREATE TABLE dbo.TableB(
      [index] int NOT NULL
        CONSTRAINT PK_TableB PRIMARY KEY
    , value int
    );
INSERT  INTO dbo.TableA
        ( [index], value )
VALUES  ( 1, 123 ),
        ( 2, 245 ),
        ( 3, 342 ),
        ( 4, 456 ),
        ( 5, 608 );

INSERT  INTO dbo.TableB
        ( [index], value )
VALUES  ( 1, 152 ),
        ( 2, 159 ),
        ( 3, 259 );

WITH 
      ranked_matches AS (
        SELECT 
              a.[index] AS a_index
            , a.value AS a_value
            , b.[index] b_index
            , b.value AS b_value
            , RANK() OVER(PARTITION BY a.[index] ORDER BY ABS(a.Value - b.value), b.[index]) AS a_match_rank
            , RANK() OVER(PARTITION BY b.[index] ORDER BY ABS(a.Value - b.value), a.[index]) AS b_match_rank
        FROM dbo.TableA AS a
        CROSS JOIN dbo.TableB AS b
    )
    , best_matches AS (
        SELECT
              a_index
            , a_value
            , b_index
            , b_value
        FROM ranked_matches
        WHERE
                a_match_rank = 1
            AND b_match_rank= 1
    )
SELECT
      TableA.[index] AS a_index
    , TableA.value AS a_value
    , best_matches.b_index
    , best_matches.b_value
FROM dbo.TableA
LEFT JOIN best_matches ON
    best_matches.a_index = TableA.[index]
ORDER BY
    TableA.[index];

EDIT:

Although this method uses CTEs, recursion is not used and is therefore not limited to 32K recursions. There may be room for improvement here from a performance perspective, though.

like image 164
Dan Guzman Avatar answered Apr 01 '23 17:04

Dan Guzman