I am using Rails 4.2 with PostgreSQL. I have a Product
model and a Purchase
model with Product
has many
Purchases
. I want to find the distinct recently purchased products. Initially I tried:
Product.joins(:purchases)
.select("DISTINCT products.*, purchases.updated_at") #postgresql requires order column in select
.order("purchases.updated_at DESC")
This however results in duplicates because it tries to find all tuples where the pair (product.id
and purchases.updated_at
) has a unique value. However I just want to select the products with distinct id
after the join. If a product id appears multiple times in the join, only select the first one. So I also tried:
Product.joins(:purchases)
.select("DISTINCT ON (product.id) purchases.updated_at, products.*")
.order("product.id, purchases.updated_at") #postgres requires that DISTINCT ON must match the leftmost order by clause
This doesn't work because I need to specify product.id
in the order
clause because of this constraint which outputs unexpected order.
What is the rails way to achieve this?
So building on @ErwinBrandstetter answer, I finally found the right way of doing this. The query to find distinct recent purchases is
SELECT *
FROM (
SELECT DISTINCT ON (pr.id)
pu.updated_at, pr.*
FROM Product pr
JOIN Purchases pu ON pu.product_id = pr.id
) sub
ORDER BY updated_at DESC NULLS LAST;
The order_by
isn't needed inside the subquery, since we are anyway ordering in the outer query.
The rails way of doing this is -
inner_query = Product.joins(:purchases)
.select("DISTINCT ON (products.id) products.*, purchases.updated_at as date") #This selects all the unique purchased products.
result = Product.from("(#{inner_query.to_sql}) as unique_purchases")
.select("unique_purchases.*").order("unique_purchases.date DESC")
The second (and better) way to do this as suggested by @ErwinBrandstetter is
SELECT *
FROM Product pr
JOIN (
SELECT product_id, max(updated_at) AS updated_at
FROM Purchases
GROUP BY 1
) pu ON pu.product_id = pr.id
ORDER BY pu.updated_at DESC NULLS LAST;
which can written in rails as
join_query = Purchase.select("product_id, max(updated_at) as date")
.group(1) #This selects most recent date for all purchased products
result = Product.joins("INNER JOIN (#{join_query.to_sql}) as unique_purchases ON products.id = unique_purchases.product_id")
.order("unique_purchases.date")
Use a subquery and add a different ORDER BY
clause in the outer SELECT
:
SELECT *
FROM (
SELECT DISTINCT ON (pr.id)
pu.updated_at, pr.*
FROM Product pr
JOIN Purchases pu ON pu.product_id = pr.id -- guessing
ORDER BY pr.id, pu.updated_at DESC NULLS LAST
) sub
ORDER BY updated_at DESC NULLS LAST;
Details for DISTINCT ON
:
Or some other query technique:
But if all you need from Purchases
is updated_at
, you can get this cheaper with a simple aggregate in a subquery before you join:
SELECT *
FROM Product pr
JOIN (
SELECT product_id, max(updated_at) AS updated_at
FROM Purchases
GROUP BY 1
) pu ON pu.product_id = pr.id -- guessing
ORDER BY pu.updated_at DESC NULLS LAST;
About NULLS LAST
:
Or even simpler, but not as fast while retrieving all rows:
SELECT pr.*, max(updated_at) AS updated_at
FROM Product pr
JOIN Purchases pu ON pu.product_id = pr.id
GROUP BY pr.id -- must be primary key
ORDER BY 2 DESC NULLS LAST;
Product.id
needs to be defined as primary key for this to work. Details:
If you fetch only a small selection (with a WHERE
clause restricting to just one or a few pr.id
for instance), this will be faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With