PostgreSQL DISTINCT ON with different ORDER BY

Tags:

I want to run this query:

SELECT DISTINCT ON (address_id) purchases.address_id, purchases.* FROM purchases WHERE purchases.product_id = 1 ORDER BY purchases.purchased_at DESC

But I get this error:

PG::Error: ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions

Adding address_id as first ORDER BY expression silences the error, but I really don't want to add sorting over address_id. Is it possible to do without ordering by address_id?

577

asked Mar 20 '12 21:03

sl_bug

2 Answers

Documentation says:

DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. [...] Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. [...] The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).

Official documentation

So you'll have to add the address_id to the order by.

Alternatively, if you're looking for the full row that contains the most recent purchased product for each address_id and that result sorted by purchased_at then you're trying to solve a greatest N per group problem which can be solved by the following approaches:

The general solution that should work in most DBMSs:

SELECT t1.* FROM purchases t1 JOIN (     SELECT address_id, max(purchased_at) max_purchased_at     FROM purchases     WHERE product_id = 1     GROUP BY address_id ) t2 ON t1.address_id = t2.address_id AND t1.purchased_at = t2.max_purchased_at ORDER BY t1.purchased_at DESC

A more PostgreSQL-oriented solution based on @hkf's answer:

SELECT * FROM (   SELECT DISTINCT ON (address_id) *   FROM purchases    WHERE product_id = 1   ORDER BY address_id, purchased_at DESC ) t ORDER BY purchased_at DESC

Problem clarified, extended and solved here: Selecting rows ordered by some column and distinct on another

135

answered Sep 22 '22 10:09

Mosty Mostacho

A subquery can solve it:

SELECT * FROM  (     SELECT DISTINCT ON (address_id) *     FROM   purchases     WHERE  product_id = 1     ) p ORDER  BY purchased_at DESC;

Leading expressions in ORDER BY have to agree with columns in DISTINCT ON, so you can't order by different columns in the same SELECT.

Only use an additional ORDER BY in the subquery if you want to pick a particular row from each set:

SELECT * FROM  (     SELECT DISTINCT ON (address_id) *     FROM   purchases     WHERE  product_id = 1     ORDER  BY address_id, purchased_at DESC  -- get "latest" row per address_id     ) p ORDER  BY purchased_at DESC;

If purchased_at can be NULL, use DESC NULLS LAST - and match your index for best performance. See:

Sort by column ASC, but NULL values first?
Why does ORDER BY NULLS LAST affect the query plan on a primary key?

Related, with more explanation:

Select first row in each GROUP BY group?
Sort by column ASC, but NULL values first?

answered Sep 23 '22 10:09

Erwin Brandstetter

Related questions
                            
                                How do you return the column names of a table?
                            
                                Counting DISTINCT over multiple columns
                            
                                How to find duplicate records in PostgreSQL
                            
                                Correct use of transactions in SQL Server
                            
                                How can I group by date time column without taking time into consideration
                            
                                How to deal with SQL column names that look like SQL keywords?
                            
                                How do I use an INSERT statement's OUTPUT clause to get the identity value?
                            
                                In Postgresql, force unique on combination of two columns
                            
                                Select count(*) from multiple tables
                            
                                List columns with indexes in PostgreSQL
                            
                                What is the difference between single and double quotes in SQL?
                            
                                Adding a new SQL column with a default value
                            
                                INSERT statement conflicted with the FOREIGN KEY constraint - SQL Server
                            
                                PostgreSQL: Give all permissions to a user on a PostgreSQL database
                            
                                How do I drop a foreign key constraint only if it exists in sql server?
                            
                                How can I backup a remote SQL Server database to a local drive?
                            
                                How to use count and group by at the same select statement
                            
                                How to use DbContext.Database.SqlQuery<TElement>(sql, params) with stored procedure? EF Code First CTP5
                            
                                What does ON [PRIMARY] mean?
                            
                                Update multiple rows in same query using PostgreSQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PostgreSQL DISTINCT ON with different ORDER BY

Tags:

sql

postgresql

sql-order-by

distinct-on

sl_bug

People also ask

2 Answers

Mosty Mostacho

Erwin Brandstetter

Recent Activity

Donate For Us