Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using LIMIT within GROUP BY to get N results per group?

The following query:

SELECT year, id, rate FROM h WHERE year BETWEEN 2000 AND 2009 AND id IN (SELECT rid FROM table2) GROUP BY id, year ORDER BY id, rate DESC 

yields:

year    id  rate 2006    p01 8 2003    p01 7.4 2008    p01 6.8 2001    p01 5.9 2007    p01 5.3 2009    p01 4.4 2002    p01 3.9 2004    p01 3.5 2005    p01 2.1 2000    p01 0.8 2001    p02 12.5 2004    p02 12.4 2002    p02 12.2 2003    p02 10.3 2000    p02 8.7 2006    p02 4.6 2007    p02 3.3 

What I'd like is only the top 5 results for each id:

2006    p01 8 2003    p01 7.4 2008    p01 6.8 2001    p01 5.9 2007    p01 5.3 2001    p02 12.5 2004    p02 12.4 2002    p02 12.2 2003    p02 10.3 2000    p02 8.7 

Is there a way to do this using some kind of LIMIT like modifier that works within the GROUP BY?

like image 647
Wells Avatar asked Jan 25 '10 01:01

Wells


People also ask

Can I use limit with GROUP BY in SQL?

No, you can't LIMIT subqueries arbitrarily (you can do it to a limited extent in newer MySQLs, but not for 5 results per group). This is a groupwise-maximum type query, which is not trivial to do in SQL.

How do you get top and rows per group?

Selecting a top n records for each category from any table, can be done easily using row_number function which generates a sequential integer to each row within a partition of a result set.

How do you SELECT the first least maximum row per group in SQL?

To do that, you can use the ROW_NUMBER() function. In OVER() , you specify the groups into which the rows should be divided ( PARTITION BY ) and the order in which the numbers should be assigned to the rows ( ORDER BY ). You assign the row numbers within each group (i.e., year).


1 Answers

You could use GROUP_CONCAT aggregated function to get all years into a single column, grouped by id and ordered by rate:

SELECT   id, GROUP_CONCAT(year ORDER BY rate DESC) grouped_year FROM     yourtable GROUP BY id 

Result:

----------------------------------------------------------- |  ID | GROUPED_YEAR                                      | ----------------------------------------------------------- | p01 | 2006,2003,2008,2001,2007,2009,2002,2004,2005,2000 | | p02 | 2001,2004,2002,2003,2000,2006,2007                | ----------------------------------------------------------- 

And then you could use FIND_IN_SET, that returns the position of the first argument inside the second one, eg.

SELECT FIND_IN_SET('2006', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000'); 1  SELECT FIND_IN_SET('2009', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000'); 6 

Using a combination of GROUP_CONCAT and FIND_IN_SET, and filtering by the position returned by find_in_set, you could then use this query that returns only the first 5 years for every id:

SELECT   yourtable.* FROM   yourtable INNER JOIN (     SELECT       id,       GROUP_CONCAT(year ORDER BY rate DESC) grouped_year     FROM       yourtable     GROUP BY id) group_max   ON yourtable.id = group_max.id      AND FIND_IN_SET(year, grouped_year) BETWEEN 1 AND 5 ORDER BY   yourtable.id, yourtable.year DESC; 

Please see fiddle here.

Please note that if more than one row can have the same rate, you should consider using GROUP_CONCAT(DISTINCT rate ORDER BY rate) on the rate column instead of the year column.

The maximum length of the string returned by GROUP_CONCAT is limited, so this works well if you need to select a few records for every group.

like image 178
fthiella Avatar answered Sep 17 '22 13:09

fthiella