I have the table like following:
id | col1 | col2 | col3 | col4
---+------+------+--------+-----------
1 | abc | 23 | data1 | otherdata1
2 | def | 41 | data2 | otherdata2
3 | ghi | 41 | data3 | otherdata3
4 | jkl | 58 | data4 | otherdata4
5 | mno | 23 | data1 | otherdata5
6 | pqr | 41 | data3 | otherdata6
7 | stu | 76 | data2 | otherdata7
How can I fast select rows where col2+col3 doesn't have duplicates? There is over 15 millions of rows in the table, so join may be not suitable.
Final result should look like this:
id | col1 | col2 | col3 | col4
---+------+------+--------+-----------
2 | def | 41 | data2 | otherdata2
4 | jkl | 58 | data4 | otherdata4
7 | stu | 76 | data2 | otherdata7
Window functions are definitely one possibility. But, if you care about performance, it is also worth trying another approach and comparing the speed.
NOT EXISTS comes to mind:
select t.*
from table t
where not exists (select 1
from table t2
where t2.col2 = t.col2 and t2.col3 = t.col3 and
t2.id <> t.id
);
This can take advantage of an index on table(col2, col3).
Not sure how fast this will be, but this should work:
select id, col1, col2, col3, col4
from (
select id, col1, col2, col3, col4,
count(*) over (partition by col2, col3) as cnt
from the_table
) t
where cnt = 1
order by id;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With