Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL: Select N rows, but with only unique values in one column

Given this data set:

ID  Name            City            Birthyear
1   Egon Spengler   New York        1957
2   Mac Taylor      New York        1955
3   Sarah Connor    Los Angeles     1959
4   Jean-Luc Picard La Barre        2305
5   Ellen Ripley    Nostromo        2092
6   James T. Kirk   Riverside       2233
7   Henry Jones     Chicago         1899

I need to find the 3 oldest persons, but only one of every city.

If it would just be the three oldest, it would be...

  • Henry Jones / Chicago
  • Mac Taylor / New York
  • Egon Spengler / New York

However since both Egon Spengler and Mac Taylor are located in New York, Egon Spengler would drop out and the next one (Sarah Connor / Los Angeles) would come in instead.

Any elegant solutions?

Update:

Currently a variation of PConroy is the best/fastest solution:

SELECT P.*, COUNT(*) AS ct
   FROM people P
   JOIN (SELECT MIN(Birthyear) AS Birthyear
              FROM people 
              GROUP by City) P2 ON P2.Birthyear = P.Birthyear
   GROUP BY P.City
   ORDER BY P.Birthyear ASC 
   LIMIT 10;

His original query with "IN" is extremly slow with big datasets (aborted after 5 minutes), but moving the subquery to a JOIN will speed it up a lot. It took about 0.15 seconds for approx. 1 mio rows in my test environment. I have an index on "City, Birthyear" and a second one just on "Birthyear".

Note: This is related to...

  • Selecting unique rows in a set of two possibilities
  • SQL Query to get latest price
like image 836
BlaM Avatar asked Oct 10 '08 10:10

BlaM


People also ask

How do I get unique values from a column in MySQL?

MySQL – Distinct Values To get unique or distinct values of a column in MySQL Table, use the following SQL Query. SELECT DISTINCT(column_name) FROM your_table_name; You can select distinct values for one or more columns. The column names has to be separated with comma.

How do I select all rows except one?

You have a few options: SELECT * FROM table WHERE id != 4; SELECT * FROM table WHERE NOT id = 4; SELECT * FROM table WHERE id <> 4; Also, considering perhaps sometime in the future you may want to add/remove id's to this list, perhaps another table listing id's which you don't want selectable would be a good idea.

How do I select only unique rows in SQL?

The SELECT DISTINCT statement is used to return only distinct (different) values. Inside a table, a column often contains many duplicate values; and sometimes you only want to list the different (distinct) values.

How do I select n rows in MySQL?

Here's the syntax to select top N rows in MySQL. In the above statement, we list the columns column1, column2, … that you want to select in your query. Also, you need to specify LIMIT n after the table name, where n is the number of rows you want to select. The above query will select top n records in your table.


3 Answers

Probably not the most elegant of solutions, and the performance of IN may suffer on larger tables.

The nested query gets the minimum Birthyear for each city. Only records who have this Birthyear are matched in the outer query. Ordering by age then limiting to 3 results gets you the 3 oldest people who are also the oldest in their city (Egon Spengler drops out..)

SELECT Name, City, Birthyear, COUNT(*) AS ct FROM table WHERE Birthyear IN (SELECT MIN(Birthyear)                FROM table                GROUP by City) GROUP BY City ORDER BY Birthyear DESC LIMIT 3;  +-----------------+-------------+------+----+ | name            | city        | year | ct | +-----------------+-------------+------+----+ | Henry Jones     | Chicago     | 1899 | 1  | | Mac Taylor      | New York    | 1955 | 1  | | Sarah Connor    | Los Angeles | 1959 | 1  | +-----------------+-------------+------+----+ 

Edit - added GROUP BY City to outer query, as people with same birth years would return multiple values. Grouping on the outer query ensures that only one result will be returned per city, if more than one person has that minimum Birthyear. The ct column will show if more than one person exists in the city with that Birthyear

like image 83
ConroyP Avatar answered Sep 17 '22 13:09

ConroyP


This is probably not the most elegant and quickest solution, but it should work. I am looking forward the see the solutions of real database gurus.

select p.* from people p,
(select city, max(age) as mage from people group by city) t
where p.city = t.city and p.age = t.mage
order by p.age desc
like image 35
Tamas Czinege Avatar answered Sep 19 '22 13:09

Tamas Czinege


Something like that?

SELECT
  Id, Name, City, Birthyear
FROM
  TheTable
WHERE
  Id IN (SELECT TOP 1 Id FROM TheTable i WHERE i.City = TheTable.City ORDER BY Birthyear)
like image 32
Tomalak Avatar answered Sep 18 '22 13:09

Tomalak