Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rails - Distinct ON after a join

I am using Rails 4.2 with PostgreSQL. I have a Product model and a Purchase model with Product has many Purchases. I want to find the distinct recently purchased products. Initially I tried:

Product.joins(:purchases)
.select("DISTINCT products.*, purchases.updated_at") #postgresql requires order column in select
.order("purchases.updated_at DESC")

This however results in duplicates because it tries to find all tuples where the pair (product.id and purchases.updated_at) has a unique value. However I just want to select the products with distinct id after the join. If a product id appears multiple times in the join, only select the first one. So I also tried:

Product.joins(:purchases)
.select("DISTINCT ON (product.id) purchases.updated_at, products.*")
.order("product.id, purchases.updated_at") #postgres requires that DISTINCT ON must match the leftmost order by clause

This doesn't work because I need to specify product.id in the order clause because of this constraint which outputs unexpected order.

What is the rails way to achieve this?

like image 446
aandis Avatar asked Sep 25 '15 05:09

aandis


2 Answers

So building on @ErwinBrandstetter answer, I finally found the right way of doing this. The query to find distinct recent purchases is

SELECT *
FROM  (
   SELECT DISTINCT ON (pr.id)
          pu.updated_at, pr.*
   FROM   Product pr
   JOIN   Purchases pu ON pu.product_id = pr.id
   ) sub
ORDER  BY updated_at DESC NULLS LAST;

The order_by isn't needed inside the subquery, since we are anyway ordering in the outer query.

The rails way of doing this is -

inner_query = Product.joins(:purchases)
  .select("DISTINCT ON (products.id) products.*, purchases.updated_at as date") #This selects all the unique purchased products.

result = Product.from("(#{inner_query.to_sql}) as unique_purchases")
  .select("unique_purchases.*").order("unique_purchases.date DESC")

The second (and better) way to do this as suggested by @ErwinBrandstetter is

SELECT *
FROM   Product pr
JOIN  (
   SELECT product_id, max(updated_at) AS updated_at
   FROM   Purchases 
   GROUP  BY 1
   ) pu ON pu.product_id = pr.id
ORDER  BY pu.updated_at DESC NULLS LAST;

which can written in rails as

join_query = Purchase.select("product_id, max(updated_at) as date")
  .group(1) #This selects most recent date for all purchased products

result = Product.joins("INNER JOIN (#{join_query.to_sql}) as unique_purchases ON products.id = unique_purchases.product_id")
  .order("unique_purchases.date")
like image 99
aandis Avatar answered Sep 30 '22 04:09

aandis


Use a subquery and add a different ORDER BY clause in the outer SELECT:

SELECT *
FROM  (
   SELECT DISTINCT ON (pr.id)
          pu.updated_at, pr.*
   FROM   Product pr
   JOIN   Purchases pu ON pu.product_id = pr.id  -- guessing
   ORDER  BY pr.id, pu.updated_at DESC NULLS LAST
   ) sub
ORDER  BY updated_at DESC NULLS LAST;

Details for DISTINCT ON:

  • Select first row in each GROUP BY group?

Or some other query technique:

  • Optimize GROUP BY query to retrieve latest record per user

But if all you need from Purchases is updated_at, you can get this cheaper with a simple aggregate in a subquery before you join:

SELECT *
FROM   Product pr
JOIN  (
   SELECT product_id, max(updated_at) AS updated_at
   FROM   Purchases 
   GROUP  BY 1
   ) pu ON pu.product_id = pr.id  -- guessing
ORDER  BY pu.updated_at DESC NULLS LAST;

About NULLS LAST:

  • PostgreSQL sort by datetime asc, null first?

Or even simpler, but not as fast while retrieving all rows:

SELECT pr.*, max(updated_at) AS updated_at
FROM   Product pr
JOIN   Purchases pu ON pu.product_id = pr.id
GROUP  BY pr.id  -- must be primary key
ORDER  BY 2 DESC NULLS LAST;

Product.id needs to be defined as primary key for this to work. Details:

  • PostgreSQL - GROUP BY clause
  • Return a grouped list with occurrences using Rails and PostgreSQL

If you fetch only a small selection (with a WHERE clause restricting to just one or a few pr.id for instance), this will be faster.

like image 41
Erwin Brandstetter Avatar answered Sep 30 '22 04:09

Erwin Brandstetter