Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using a DISTINCT clause to filter data but still pull other fields that are not DISTINCT

I am trying to write a query in Postgresql that pulls a set of ordered data and filters it by a distinct field. I also need to pull several other fields from the same table row, but they need to be left out of the distinct evaluation. example:

  SELECT DISTINCT(user_id) user_id, 
         created_at 
    FROM creations 
ORDER BY created_at   
   LIMIT 20

I need the user_id to be DISTINCT, but don't care whether the created_at date is unique or not. Because the created_at date is being included in the evaluation, I am getting duplicate user_id in my result set.

Also, the data must be ordered by the date, so using DISTINCT ON is not an option here. It required that the DISTINCT ON field be the first field in the ORDER BY clause and that does not deliver the results that I seek.

How do I properly use the DISTINCT clause but limit its scope to only one field while still selecting other fields?

like image 938
mindtonic Avatar asked Oct 05 '10 22:10

mindtonic


People also ask

Which clause is used to fetch non repeating values of fields?

The SELECT DISTINCT statement is used to return only distinct (different) values. Inside a table, a column often contains many duplicate values; and sometimes you only want to list the different (distinct) values.

Does distinct work on multiple columns?

Yes, DISTINCT works on all combinations of column values for all columns in the SELECT clause.

Can we get distinct records from table without using distinct keyword?

SELECT * FROM dup_table; Now let's retrieve distinct rows without using the DISTINCT clause.

Can we use distinct with other columns in SQL?

Adding the DISTINCT keyword to a SELECT query causes it to return only unique values for the specified column list so that duplicate rows are removed from the result set. Since DISTINCT operates on all of the fields in SELECT's column list, it can't be applied to an individual field that are part of a larger group.


2 Answers

As you've discovered, standard SQL treats DISTINCT as applying to the whole select-list, not just one column or a few columns. The reason for this is that it's ambiguous what value to put in the columns you exclude from the DISTINCT. For the same reason, standard SQL doesn't allow you to have ambiguous columns in a query with GROUP BY.

But PostgreSQL has a nonstandard extension to SQL to allow for what you're asking: DISTINCT ON (expr).

SELECT DISTINCT ON (user_id) user_id, created_at 
FROM creations 
ORDER BY user_id, created_at   
LIMIT 20

You have to include the distinct expression(s) as the leftmost part of your ORDER BY clause.

See the manual on DISTINCT Clause for more information.

like image 84
Bill Karwin Avatar answered Oct 11 '22 11:10

Bill Karwin


If you want the most recent created_at for each user then I suggest you aggregate like this:

SELECT user_id, MAX(created_at)
FROM creations
WHERE ....
GROUP BY user_id
ORDER BY created_at DESC

This will return the most recent created_at for each user_id If you only want the top 20, then append

LIMIT 20

EDIT: This is basically the same thing Unreason said above... define from which row you want the data by aggregation.

like image 27
Matthew Avatar answered Oct 11 '22 12:10

Matthew