Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make ActiveRecord query unique by a column

Tags:

I have a Company model that has many Disclosures. The Disclosure has columns named title, pdf and pdf_sha256.

class Company < ActiveRecord::Base
  has_many :disclosures
end

class Disclosure < ActiveRecord::Base
  belongs_to :company
end

I want to make it unique by pdf_sha256 and if pdf_sha256 is nil that should be treated as unique.

If it is an Array, I'll write like this.

companies_with_sha256 = company.disclosures.where.not(pdf_sha256: nil).group_by(&:pdf_sha256).map do |key,values|
  values.max_by{|v| v.title.length}
end
companies_without_sha256 = company.disclosures.where(pdf_sha256: nil)
companies = companies_with_sha256 + companeis_without_sha256

How can I get the same result by using ActiveRecord query?

like image 438
ironsand Avatar asked Oct 11 '17 12:10

ironsand


1 Answers

It is possible to do it in one query by first getting a different id for each different pdf_sha256 as a subquery, then in the query getting the elements within that set of ids by passing the subquery as follows:

def unique_disclosures_by_pdf_sha256(company)
  subquery = company.disclosures.select('MIN(id) as id').group(:pdf_sha256)
  company.disclosures.where(id: subquery)
    .or(company.disclosures.where(pdf_sha256: nil))
end

The great thing about this is that ActiveRecord is lazy loaded, so the first subquery will not be run and will be merged to the second main query to create a single query in the database. It will then retrieve all the disclosures unique by pdf_sha256 plus all the ones that have pdf_sha256 set to nil.

In case you are curious, given a company, the resulting query will be something like:

SELECT  "disclosures".* FROM "disclosures" 
WHERE (
  "disclosures"."company_id" = $1 AND "disclosures"."id" IN (
    SELECT MAX(id) as id FROM "disclosures" WHERE "disclosures"."company_id" = $2 GROUP BY "disclosures"."pdf_sha256"
  ) 
  OR "disclosures"."company_id" = $3 AND "disclosures"."pdf_sha256" IS NULL
)

The great thing about this solution is that the returned value is an ActiveRecord query, so it won't be loaded until you actually need. You can also use it to keep chaining queries. Example, you can select only the id instead of the whole model and limit the number of results returned by the database:

unique_disclosures_by_pdf_sha256(company).select(:id).limit(10).each { |d| puts d }
like image 188
Alex Santos Avatar answered Sep 30 '22 00:09

Alex Santos