Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select top 3 values from each group in a table with SQL which have duplicates [duplicate]

Tags:

Assume we have a table which has two columns, one column contains the names of some people and the other column contains some values related to each person. One person can have more than one value. Each value has a numeric type. The question is we want to select the top 3 values for each person from the table. If one person has less than 3 values, we select all the values for that person.

The issue can be solved if there are no duplicates in the table by the query provided in this article Select top 3 values from each group in a table with SQL . But if there are duplicates, what is the solution?

For example, if for one name John, he has 5 values related to him. They are 20,7,7,7,4. I need to return the name/value pairs as below order by value descending for each name:

-----------+-------+ | name     | value | -----------+-------+ | John     |    20 | | John     |     7 | | John     |     7 | -----------+-------+ 

Only 3 rows should be returned for John even though there are three 7s for John.

like image 334
PixelsTech Avatar asked May 23 '13 17:05

PixelsTech


1 Answers

In many modern DBMS (e.g. Postgres, Oracle, SQL-Server, DB2 and many others), the following will work just fine. It uses CTEs and ranking function ROW_NUMBER() which is part of the latest SQL standard:

 WITH cte AS   ( SELECT name, value,            ROW_NUMBER() OVER (PARTITION BY name                               ORDER BY value DESC                              )              AS rn     FROM t   ) SELECT name, value, rn FROM cte WHERE rn <= 3 ORDER BY name, rn ; 

Without CTE, only ROW_NUMBER():

SELECT name, value, rn FROM    ( SELECT name, value,            ROW_NUMBER() OVER (PARTITION BY name                               ORDER BY value DESC                              )              AS rn     FROM t   ) tmp  WHERE rn <= 3 ORDER BY name, rn ;  

Tested in:

  • Postgres
  • Oracle
  • SQL-Server

In MySQL and other DBMS that do not have ranking functions, one has to use either derived tables, correlated subqueries or self-joins with GROUP BY.

The (tid) is assumed to be the primary key of the table:

SELECT t.tid, t.name, t.value,              -- self join and GROUP BY        COUNT(*) AS rn FROM t   JOIN t AS t2     ON  t2.name = t.name     AND ( t2.value > t.value         OR  t2.value = t.value         AND t2.tid <= t.tid         ) GROUP BY t.tid, t.name, t.value HAVING COUNT(*) <= 3 ORDER BY name, rn ;   SELECT t.tid, t.name, t.value, rn FROM   ( SELECT t.tid, t.name, t.value,            ( SELECT COUNT(*)                -- inline, correlated subquery              FROM t AS t2              WHERE t2.name = t.name               AND ( t2.value > t.value                  OR  t2.value = t.value                  AND t2.tid <= t.tid                   )            ) AS rn     FROM t   ) AS t WHERE rn <= 3 ORDER BY name, rn ; 

Tested in MySQL

like image 190
ypercubeᵀᴹ Avatar answered Oct 22 '22 02:10

ypercubeᵀᴹ