Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all records which have a count of an association greater than zero

1) To get Projects with at least 1 vacancy:

Project.joins(:vacancies).group('projects.id')

2) To get Projects with more than 1 vacancy:

Project.joins(:vacancies).group('projects.id').having('count(project_id) > 1')

3) Or, if Vacancy model sets counter cache:

belongs_to :project, counter_cache: true

then this will work, too:

Project.where('vacancies_count > ?', 1)

Inflection rule for vacancy may need to be specified manually?


joins uses an inner join by default so using Project.joins(:vacancies) will in effect only return projects that have an associated vacancy.

UPDATE:

As pointed out by @mackskatz in the comment, without a group clause, the code above will return duplicate projects for projects with more than one vacancies. To remove the duplicates, use

Project.joins(:vacancies).group('projects.id')

UPDATE:

As pointed out by @Tolsee, you can also use distinct.

Project.joins(:vacancies).distinct

As an example

[10] pry(main)> Comment.distinct.pluck :article_id
=> [43, 34, 45, 55, 17, 19, 1, 3, 4, 18, 44, 5, 13, 22, 16, 6, 53]
[11] pry(main)> _.size
=> 17
[12] pry(main)> Article.joins(:comments).size
=> 45
[13] pry(main)> Article.joins(:comments).distinct.size
=> 17
[14] pry(main)> Article.joins(:comments).distinct.to_sql
=> "SELECT DISTINCT \"articles\".* FROM \"articles\" INNER JOIN \"comments\" ON \"comments\".\"article_id\" = \"articles\".\"id\""

Yeah, vacancies is not a field in the join. I believe you want:

Project.joins(:vacancies).group("projects.id").having("count(vacancies.id)>0")

# None
Project.joins(:vacancies).group('projects.id').having('count(vacancies) = 0')
# Any
Project.joins(:vacancies).group('projects.id').having('count(vacancies) > 0')
# One
Project.joins(:vacancies).group('projects.id').having('count(vacancies) = 1')
# More than 1
Project.joins(:vacancies).group('projects.id').having('count(vacancies) > 1')

Performing an inner join to the has_many table combined with a group or uniq is potentially very inefficient, and in SQL this would be better implemented as a semi-join that uses EXISTS with a correlated subquery.

This allows the query optimiser to probe the vacancies table to check for the existence of a row with the correct project_id. It doesn't matter whether there is one row or a million that have that project_id.

That's not as straightforward in Rails, but can be achieved with:

Project.where(Vacancies.where("vacancies.project_id = projects.id").exists)

Similarly, find all projects that have no vacancies:

Project.where.not(Vacancies.where("vacancies.project_id = projects.id").exists)

Edit: in recent Rails versions you get a deprecation warning telling you to not to rely on exists being delegated to arel. Fix this with:

Project.where.not(Vacancies.where("vacancies.project_id = projects.id").arel.exists)

Edit: if you're uncomfortable with raw SQL, try:

Project.where.not(Vacancies.where(Vacancy.arel_table[:project_id].eq(Project.arel_table[:id])).arel.exists)

You can make this less messy by adding class methods to hide the use of arel_table, for example:

class Project
  def self.id_column
    arel_table[:id]
  end
end

... so ...

Project.where.not(
  Vacancies.where(
    Vacancy.project_id_column.eq(Project.id_column)
  ).arel.exists
)

In Rails 4+, you can also use includes or eager_load to get the same answer:

Project.includes(:vacancies).references(:vacancies).
        where.not(vacancies: {id: nil})

Project.eager_load(:vacancies).where.not(vacancies: {id: nil})

I think there's a simpler solution:

Project.joins(:vacancies).distinct