Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to move sorting to the database level

I have a Rails app, which uses postgresql for a database, that sorts different types of users by location, and then by reputation points they receive for various activities on the site. This is an example query

 @lawyersbylocation = User.lawyers_by_province(province).sort_by{ |u| -u.total_votes }

The query calls the scope lawyers_by_province on the User.rb model:

 scope :lawyers_by_province, lambda {|province|
  joins(:contact).
  where( contacts: {province_id: province},
         users: {lawyer: true})

  }

And then, still on the User.rb model, it calculates reputation points they have.

 def total_votes
    answerkarma = AnswerVote.joins(:answer).where(answers: {user_id: self.id}).sum('value') 
    contributionkarma = Contribution.where(user_id: self.id).sum('value')
    bestanswer = BestAnswer.joins(:answer).where(answers: {user_id: self.id}).sum('value') 
    answerkarma + contributionkarma + bestanswer
 end

I've been told that if the site reaches a certain number of users, then it will become incredibly slow because it's sorting in Ruby rather than at the database level. I know that comment refers to the total_votes method, but I'm not sure if the lawyers_by_province is happening at the database level or in ruby, in that it's using Rails helpers to query the db. Seems like a mix of both to me, but I'm not sure about the effect of that on efficiency.

Can you show me how to write this so that the query is happening at the db level and therefore in a more efficient way that won't break my site?

Update Here are the three schemes for models in total_votes method.

 create_table "answer_votes", force: true do |t|
    t.integer  "answer_id"
    t.integer  "user_id"
    t.integer  "value"
    t.boolean  "lawyervote"
    t.boolean  "studentvote"
    t.datetime "created_at"
    t.datetime "updated_at"
  end

  add_index "answer_votes", ["answer_id"], name: "index_answer_votes_on_answer_id", using: :btree
  add_index "answer_votes", ["lawyervote"], name: "index_answer_votes_on_lawyervote", using: :btree
  add_index "answer_votes", ["studentvote"], name: "index_answer_votes_on_studentvote", using: :btree
  add_index "answer_votes", ["user_id"], name: "index_answer_votes_on_user_id", using: :btree



create_table "best_answers", force: true do |t|
    t.integer  "answer_id"
    t.integer  "user_id"
    t.integer  "value"
    t.datetime "created_at"
    t.datetime "updated_at"
    t.integer  "question_id"
  end

  add_index "best_answers", ["answer_id"], name: "index_best_answers_on_answer_id", using: :btree
  add_index "best_answers", ["user_id"], name: "index_best_answers_on_user_id", using: :btree



create_table "contributions", force: true do |t|
    t.integer  "user_id"
    t.integer  "answer_id"
    t.integer  "value"
    t.datetime "created_at"
    t.datetime "updated_at"
  end

  add_index "contributions", ["answer_id"], name: "index_contributions_on_answer_id", using: :btree
  add_index "contributions", ["user_id"], name: "index_contributions_on_user_id", using: :btree

Also, here is the contact scheme which contains the province_id used in the lawyers_by_province scope on user.rb model

  create_table "contacts", force: true do |t|
    t.string   "firm"
    t.string   "address"
    t.integer  "province_id"
    t.string   "city"
    t.string   "postalcode"
    t.string   "mobile"
    t.string   "office"
    t.integer  "user_id"
    t.string   "website"
    t.datetime "created_at"
    t.datetime "updated_at"
  end

Update Trying to apply the answer by @Shawn, I put this method in the user.rb model

 def self.total_vote_sql
    "( " +
    [
     AnswerVote.joins(:answer).select("user_id, value"),
     Contribution.select("user_id, value"),
     BestAnswer.joins(:answer).select("user_id, value")
    ].map(&:to_sql) * " UNION ALL " + 
    ") as total_votes "
  end

and then in the controller, I did this (putting User in front of total_vote_sql)

@lawyersbyprovince = User.select("users.*, sum(total_votes.value) as total_votes").joins("left outer join #{User.total_vote_sql} on users.id = total_votes.user_id").
                            order("total_votes desc").lawyers_by_province(province)

It's giving me this error

ActiveRecord::StatementInvalid in LawyerProfilesController#index

PG::Error: ERROR: column reference "user_id" is ambiguous LINE 1: ..."user_id" = "users"."id" left outer join ( SELECT user_id, v... ^ : SELECT users.*, sum(total_votes.value) as total_votes FROM "users" INNER JOIN "contacts" ON "contacts"."user_id" = "users"."id" left outer join ( SELECT user_id, value FROM "answer_votes" INNER JOIN "answers" ON "answers"."id" = "answer_votes"."answer_id" UNION ALL SELECT user_id, value FROM "contributions" UNION ALL SELECT user_id, value FROM "best_answers" INNER JOIN "answers" ON "answers"."id" = "best_answers"."answer_id") as total_votes on users.id = total_votes.user_id WHERE "contacts"."province_id" = 6 AND "users"."lawyer" = 't' ORDER BY total_votes desc

Update After applying edits to Shawn's post, the error message is now this:

PG::Error: ERROR: column reference "user_id" is ambiguous LINE 1: ..."user_id" = "users"."id" left outer join ( SELECT user_id as... ^ : SELECT users.*, sum(total_votes.value) as total_votes FROM "users" INNER JOIN "contacts" ON "contacts"."user_id" = "users"."id" left outer join ( SELECT user_id as tv_user_id, value FROM "answer_votes" INNER JOIN "answers" ON "answers"."id" = "answer_votes"."answer_id" UNION ALL SELECT user_id as tv_user_id, value FROM "contributions" UNION ALL SELECT user_id as tv_user_id, value FROM "best_answers" INNER JOIN "answers" ON "answers"."id" = "best_answers"."answer_id") as total_votes on users.id = total_votes.tv_user_id WHERE "contacts"."province_id" = 6 AND "users"."lawyer" = 't' ORDER BY total_votes desc
like image 344
BrainLikeADullPencil Avatar asked May 17 '13 15:05

BrainLikeADullPencil


1 Answers

Caveat: I'm quite new to Rails, but this is my technique for keeping sane while needing to continually go straight to the database for performance reasons, which I need to do all the time because you can only have two of the following

  1. Processing of bulk data
  2. A pure-Rails technique
  3. Good performance

Anyway, once you need to get into these hybrid methodologies, which are part-ruby part-SQL I feel like you might as well go the whole hog and opt for a pure SQL solution.

  1. It's easier to debug because you're isolating the two code layers more effectively.
  2. It's easier to optimise the SQL because you stand a better chance of getting a dedicated SQL person to look at it for you if it's not your strong-point.

I think the SQL that you're looking for here is along the lines of:

with cte_scoring as (
  select
    users.id,
    (select Coalesce(sum(value),0) from answer_votes  where answer_votes.user_id  = users.id) +
    (select Coalesce(sum(value),0) from best_answers  where best_answers.user_id  = users.id) +
    (select Coalesce(sum(value),0) from contributions where contributions.user_id = users.id) total_score
  from
    users join
    contacts on (contacts.user_id = users.id)
  where
    users.lawyer         = 'true'          and
    contacts.province_id = #{province.id})
select   id,
         total_score
from     cte_scoring
order by total_score desc
limit    #{limit_number}

This ought to give you the best possible performance -- the subqueries in the SELECT are not ideal but the technique does apply filtering on which user_id you're checking the score for.

Integrating into Rails: If you define sql_string as the SQL code:

scoring = ActiveRecord::Base.connection.execute sql_string

... then you get an array of hashes back that you work with like this:

scoring.each do |lawyer_score|
  lawyer = User.find(lawyer_score["id"])
  score  = lawyer_score["total_score"]
  ...
end
like image 61
David Aldridge Avatar answered Sep 21 '22 13:09

David Aldridge