I have a table with a lot of records with some fields duplicated. I want the most common of each of those duplications.
So, if my table has data like below:
ID Field1 Field2
1 A 10
2 A 12
3 B 5
4 A 10
5 B 5
6 A 10
7 B 8
8 B 5
9 A 10
I can select distinct and get counts:
select distinct Field1, Field2, count(Field1)
from Table
group by Field1, Field2
order by Field1, count(Field1) desc
And that will give me
Field1 Field2 Count
A 10 4
A 12 1
B 5 3
B 8 1
However, I only want the records for each Field1 that have the highest count. I've been fighting with rank() over partition and subqueries, but haven't found the correct syntax for using two fields for uniqueness and selecting the top record by count. I've been searching, and I'm sure this has been asked, but I can't find it.
I want to get the following
Field1 Field2 (optional) Count
A 10 4
B 5 3
The goal is to look at a table that has just a little bit of incorrect data (linking between field1 and field2 wrong) and determine what it SHOULD be based on what it usually is. I don't know how many bad records there are, so eliminating Count below a certain threshold would work, but seems a bit kludgy.
If it is better, I can make a temp table to put my distinct values into and then select from there, but it doesn't seem like that should be necessary.
I think this is what you're looking for:
select field1, field2, cnt from
(select field1, field2, cnt, rank() over (partition by field1 order by cnt desc) rnk
from (select distinct Field1, Field2, count(Field1) cnt
from Table1
group by Field1, Field2
order by Field1, count(Field1) desc)
)
where rnk = 1;
SQL Fiddle: http://sqlfiddle.com/#!4/fe96d/3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With