Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select Distinct on one column, without ordering by that column

I'm trying to select only the IDs of a table that I'm querying on, and still be able to specify ordering on other columns.

First I tried simply doing:

SELECT DISTINCT countries.id
FROM countries
...
ORDER BY province_infos.population DESC, country_infos.population ASC

That won't work, because for SELECT DISTINCT, ORDER BY expressions must appear in select list, and returns an error.

If I add province_infos.population and country_infos.population, it works, but I then get duplicate IDs, which I cannot have.

To resolve this, i attempted using DISTINCT ON():

SELECT DISTINCT ON (countries.id)
    countries.id, country_infos.population, province_infos.population
FROM countries
...
ORDER BY province_infos.population DESC, country_infos.population ASC

That then gives me the error SELECT DISTINCT ON expressions must match initial ORDER BY expressions. I can't SELECT DISTINCT ON a column without ordering it too.

It seems the only way for this to work, is to do something like:

SELECT DISTINCT ON (countries.id) 
    countries.id
FROM countries
...
ORDER BY countries.id DESC, province_infos.population DESC, country_infos.population ASC

I unfortunately can't do this, since I cannot order by IDs, as it skews the results of the other orders. And it seems the only way to not order by the IDs, is if I remove the DISTINCT from the select, but then I'll get duplicates.

Anyone know how I can work around this?

EDIT: The ... I omitted shouldn't be relevant, but in case you want to see:

JOIN country_infos ON country_infos.country_refer = countries.id
JOIN languages ON languages.country_refer = countries.id
JOIN provinces ON provinces.country_refer = countries.id
JOIN province_infos ON province_infos.province_refer = provinces.id
WHERE country_infos.population > 10.3
AND languages.alphabet = 'Latin'

And I'm not just trying to get this working for this specific query. This is just an example I'm using to explain the predicament. I'm generating these kinds of queries automatically off of an arbitrary data structure.

like image 972
robbieperry22 Avatar asked Mar 04 '23 23:03

robbieperry22


1 Answers

The general answer to your question is that when using DISTINCT ON (x, ...) in SELECT statement in postgresql, the database sorts by the values in the distinct clause in order to make it easy to tell if the rows have distinct values (once they're ordered by the values, it only takes one pass for the db to remove duplicates, and it only needs to compare adjacent rows. Because of this, the db forces you to sort by the same columns in the distinct clause.

You can work around this by making your original query a subquery, like so:

SELECT t.id FROM
  (SELECT DISTINCT ON (countries.id) countries.id
    , province_infos.population
    , country_infos.founding_date
   FROM countries
   ...
   ORDER BY countries.id, province_infos.population DESC, country_infos.founding_date  ASC 
  )t
ORDER BY t.population DESC, T.founding_date ASC
like image 84
George S Avatar answered Mar 16 '23 02:03

George S