Logo Questions Linux Laravel Mysql Ubuntu Git Menu

LIMIT results to n unique column values?



I have some MySQL results like this:

| name | something_random |
| john | ekjalsdjalfjkldd |
| alex | akjsldfjaekallee |
| alex | jkjlkjslakjfjflj |
| alex | kajslejajejjaddd |
|  bob | ekakdie33kkd93ld |
|  bob | 33kd993kakakl3ll |
| paul | 3k309dki595k3lkd |
| paul | 3k399kkfkg93lk3l |

This goes on for 1000's of rows of results. I need to limit the number of results to the first 50 unique names. I think there is a simple solution to this but I'm not sure.

I've tried using derived tables and variables but can't quite get there. If I could figure out how to increment a variable once every time a name is different I think I could say WHERE variable <= 50.


I've tried the Inner Join approach(es) suggested below. The problem is this:

The subselect SELECT DISTINCT name FROM testTable LIMIT 50 grabs the first 50 distinct names. Perhaps I wasn't clear enough in my original post, but this limits my query too much. In my query, not every name in the table is returned in the result. Let me modify my original example:

|   id | name | something_random |
|    1 | john | ekjalsdjalfjkldd |
|    4 | alex | akjsldfjaekallee |
|    4 | alex | jkjlkjslakjfjflj |
|    4 | alex | kajslejajejjaddd |
|    6 |  bob | ekakdie33kkd93ld |
|    6 |  bob | 33kd993kakakl3ll |
|   12 | paul | 3k309dki595k3lkd |
|   12 | paul | 3k399kkfkg93lk3l |

So I added in some id numbers here. These ID numbers pertain to the people's names in the tables. So you can see in the results, not every single person/name in the table is necessarily in the result (due to some WHERE condition). So the 50th distinct name in the list will always have an ID number higher than 49. The 50th person could be id 79, 234, 4954 etc...

So back to the problem. The subselect SELECT DISTINCT name FROM testTable LIMIT 50 selects the first 50 names in the table. That means that my search results will be limited to names that have ID <=50, which is too constricting. If there are certain names that don't show up in the query (due to some WHERE condition), then they are still counted as one of the 50 distinct names. So you end up with too few results.


To @trapper: This is a basic simplification of what my query looks like:

LEFT JOIN t2 ON t1.id = t2.some_id
    (SELECT DISTINCT name FROM t1 ORDER BY id LIMIT 0,50) s ON s.name = t1.name

And my results look like this:

|   id | name |          details |
|    1 | john | ekjalsdjalfjkldd |
|    3 | alex | akjsldfjaekallee |
|    3 | alex | jkjlkjslakjfjflj |
|    4 | alex | kajslejajejjaddd |
|    6 |  bob | ekakdie33kkd93ld |
|    6 |  bob | 33kd993kakakl3ll |
|   12 | paul | 3k309dki595k3lkd |
|   12 | paul | 3k399kkfkg93lk3l |
|   37 | bill | kajslejajejjaddd |
|   37 | bill | ekakdie33kkd93ld |
|   41 | matt | 33kd993kakakl3ll |
|   50 | jake | 3k309dki595k3lkd |
|   50 | jake | 3k399kkfkg93lk3l |

The results stop at id=50. There are NOT 50 distinct names in the list. There are only roughly 23 distinct names.

like image 282
Jake Wilson Avatar asked Mar 01 '12 23:03

Jake Wilson

People also ask

What does LIMIT 1 1 do in SQL?

SELECT column_list FROM table_name ORDER BY expression LIMIT n-1, 1; In this syntax, the LIMIT n-1, 1 clause returns 1 row that starts at the row n. For example, the following query returns the employee information who has the second-highest income: SELECT emp_name, city, income FROM employees.

How to put LIMIT in query?

The limit keyword is used to limit the number of rows returned in a query result. “SELECT {fieldname(s) | *} FROM tableName(s)” is the SELECT statement containing the fields that we would like to return in our query. “[WHERE condition]” is optional but when supplied, can be used to specify a filter on the result set.

Can we apply DISTINCT on single column in SQL?

Adding the DISTINCT keyword to a SELECT query causes it to return only unique values for the specified column list so that duplicate rows are removed from the result set. Since DISTINCT operates on all of the fields in SELECT's column list, it can't be applied to an individual field that are part of a larger group.

What does LIMIT 1 do in MySQL?

In MySQL the LIMIT clause is used with the SELECT statement to restrict the number of rows in the result set. The Limit Clause accepts one or two arguments which are offset and count. The value of both the parameters can be zero or positive integers.

2 Answers

My MySql syntax may be rusty, but the idea is to use a query to select the top 50 distinct names, then do a self-join on name and select the name and other information from the join.

select a.name, b.something_random
from Table b
    inner join (select distinct name from Table order by RAND() limit 0,50) a
         on a.name = b.name
like image 150
tvanfosson Avatar answered Sep 19 '22 16:09



Edited: Ahh yes I misread question first time, this should do the trick though :)

SELECT a.name, b.something_random
FROM `table` b
     ON a.name = b.name ORDER BY a.name

How this work is the (SELECT DISTINCT name FROMtableORDER BY RAND() LIMIT 0,50) part is what pulls out the names to include in the join. So here I am taking 50 unique names at random, but you can change this to any other selection criteria if you want.

Then you join those results back into your table. This links each of those 50 selected names back to all of the rows with a matching name for your final results. Finally ORDER BY a.name just to be sure all the rows for each name end up grouped together.

like image 40
trapper Avatar answered Sep 18 '22 16:09
