Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write SQL query that selects distinct pair values for specific criteria?

I'm having trouble formulating a query for the following problem:

For pair values that have a certain score, how do you group them in way that will only return distinct pair values with the best respective scores?

For example, lets say I have a table with the following row values:

(t1,p1,65)
(t1,p2,60)
(t1,p3,20)
(t2,p1,60)
(t2,p2,59)
(t2,p3,15)

The first two columns indicate the pair values and the third column represents the pair score.The best score is (t1,p1,65). Since t1 and p1 are now used, I want to exclude them from further analysis.

The next best score is (t2,p2,59). Even though (t1,p2) has a score of 60, I want to exclude it because "t1" has already been used. (t2,p1) also has a score of 60, but since p1 is also already used, this pair is excluded.

This results in the distinct pair score values of:

(t1,p1,65)
(t2,p2,59)

Is there any way to generate this result with just a query? I've tried to think of ways of grouping and partitioning the results, but since there has to be some accounting of values already used according to score rank, I'm finding this very difficult to approach.

EDIT:

To generate the data:

with t(t, p, score) as (
    (values ('t1','p1',65),
           ('t1','p2',60),
           ('t1','p3',20),
           ('t2','p1',60),
           ('t2','p2',59),
           ('t2','p3',15)
     ))
select t.* from t;
like image 633
Stephen Tableau Avatar asked Nov 01 '16 17:11

Stephen Tableau


People also ask

How do you SELECT distinct pairs in SQL?

To get the identical rows (based on two columns agent_code and ord_amount) once from the orders table, the following SQL statement can be used : SQL Code: SELECT DISTINCT agent_code,ord_amount FROM orders WHERE agent_code='A002';

How do I get unique two column combinations in SQL?

To select distinct combinations from two columns, you can use CASE statement. Let us create a table with some columns.

How do you SELECT distinct from one column?

Adding the DISTINCT keyword to a SELECT query causes it to return only unique values for the specified column list so that duplicate rows are removed from the result set. Since DISTINCT operates on all of the fields in SELECT's column list, it can't be applied to an individual field that are part of a larger group.


1 Answers

This problem has obviously been bothering me. The following appears to implement your logic, keeping arrays of visited values in rows:

with recursive t(t, p, score) as (
    (values ('t1','p1',65),
           ('t1','p2',60),
           ('t1','p3',20),
           ('t2','p1',60),
           ('t2','p2',59),
           ('t2','p3',15)
     )),
     cte(t, p, score, cnt, lastt, lastp, ts, ps) as (
        (select t.*, count(*) over ()::int, tt.t, tt.p, ARRAY[tt.t], ARRAY[tt.p]
         from t cross join
              (select t.* from t order by score desc limit 1) tt
        ) 
        union all
        select t, p, score, 
               sum(case when not (ts @> ARRAY[t] or ps @> ARRAY[p]) then 1 else 0 end) over ()::int,
               first_value(t) over (order by case when not (ts @> ARRAY[t] or ps @> ARRAY[p]) then score end desc nulls last),
               first_value(p) over (order by case when not (ts @> ARRAY[t] or ps @> ARRAY[p]) then score end desc nulls last),
               ts || first_value(t) over (order by case when not (ts @> ARRAY[t] or ps @> ARRAY[p]) then score end desc nulls last),
               ps || first_value(p) over (order by case when not (ts @> ARRAY[t] or ps @> ARRAY[p]) then score end desc nulls last)
        from cte 
        where cnt > 0
       )
 select *
 from cte
 where lastt = t and lastp = p and cnt > 0;
like image 112
Gordon Linoff Avatar answered Sep 19 '22 15:09

Gordon Linoff