Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Exclude rows with the same values in some columns

Tags:

sql

postgresql

I have the table like following:

id | col1 | col2 | col3   | col4
---+------+------+--------+-----------
 1 | abc  | 23   | data1  | otherdata1
 2 | def  | 41   | data2  | otherdata2
 3 | ghi  | 41   | data3  | otherdata3
 4 | jkl  | 58   | data4  | otherdata4
 5 | mno  | 23   | data1  | otherdata5
 6 | pqr  | 41   | data3  | otherdata6
 7 | stu  | 76   | data2  | otherdata7

How can I fast select rows where col2+col3 doesn't have duplicates? There is over 15 millions of rows in the table, so join may be not suitable.

Final result should look like this:

id | col1 | col2 | col3   | col4
---+------+------+--------+-----------
 2 | def  | 41   | data2  | otherdata2
 4 | jkl  | 58   | data4  | otherdata4
 7 | stu  | 76   | data2  | otherdata7
like image 975
Raifeg Avatar asked Feb 21 '26 09:02

Raifeg


2 Answers

Window functions are definitely one possibility. But, if you care about performance, it is also worth trying another approach and comparing the speed.

NOT EXISTS comes to mind:

select t.*
from table t
where not exists (select 1
                  from table t2
                  where t2.col2 = t.col2 and t2.col3 = t.col3 and
                        t2.id <> t.id
                 );

This can take advantage of an index on table(col2, col3).

like image 105
Gordon Linoff Avatar answered Feb 23 '26 23:02

Gordon Linoff


Not sure how fast this will be, but this should work:

select id, col1, col2, col3, col4
from (
  select id, col1, col2, col3, col4, 
         count(*) over (partition by col2, col3) as cnt
  from the_table
) t
where cnt = 1
order by id;

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!