Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use preloaded collection in recursive methods

I've the following self referential association:

class Action < ActiveRecord::Base
  # self referential association
  has_many :action_parents
  has_many :parents, through: :action_parents
  has_many :action_children, class_name: 'ActionParent', foreign_key: 'parent_id'
  has_many :children, through: :action_children, source: :action
  …
  def should_finish
    should_start + duration
  end

  def should_start
    # my_start is a field in db: if there are no parents (root) it will use this field
    return my_start if parents.empty?
    parents.map(&:should_finish).sort.last
  end
end

My problem is the fact that should_finish and should_start are calling each other and even if I preload the parents it keeps resulting in many queries:

Action.includes(:parents).last.should_finish
# a new query every time it checks for parents

Any idea on how to cache actions and parents?

EDIT - let me give some context:

# actions table:        actions_parents table:
# id | duration         task_id | parent_id
# 1  | 5                2       | 1
# 2  | 10               3       | 1
# 3  | 20               4       | 2
# 4  | 15               4       | 3
#
#                      |--------------|
#                      | action 2     |
#         |---------- >| duration: 10 |
#         |            |--------------|
#         |                     |
#  |--------------|             |--------->|--------------|
#  | action 1     |                        | action 4     |
#  | duration: 5  |                        | duration: 15 |
#  |--------------|             |--------->|--------------|
#         |                     |
#         |            |--------------|
#         |----------->| action 3     |
#                      | duration: 20 |
#                      |--------------|

PS: there are no circular dependencies.

Assuming I have a tree my_start field of some day at 10:00:00:

# action | should_start | should_finish
# -------------------------------------
# 1      | 10:00:00*    | 10:00:05
# 2      | 10:00:05     | 10:00:15
# 3      | 10:00:05     | 10:00:25
# 4      | 10:00:25**   | 10:00:40
#
# * value from db since there is no parent
# ** should_finish of parent with latest should_finish (action 3)

I thought it could preload all actions using Action.includes(:parents)

like image 292
gabrielhilal Avatar asked Jan 26 '16 16:01

gabrielhilal


2 Answers

I'll throw a wild one before I know the specifics,

Assuming there are no prominent loops in the parents structure you can not help yourself by caching anything short of caching the entire table because every time you hit parents you will hit different parents for every action instance and no caching strategy, including the rails one, will rescue you short of moving the entire dataset into the cache.

Thing is, what you seem to be trying to do is really hard to do with a relational database and seems exactly the reason why graph databases were invented (see What are graph databases & When to use a graph database & Neo4j on Heroku)

Best thing you can do short of going to a graph database or caching your entire actions table is to optimize the queries (use pluck) and possibly rewrite them to a PLSQL function.

Plan B is to have your knowledge about your data come to your rescue,

  • do the values in should_start, duration and should_finish change? Does it change a lot?
  • is the data real-time critical? (i.e. s it OK to receive slightly outdated information every now and then)
  • does the way you structure data need to be more read-friendly or write-friendly?
  • leading to the question: does it make sense to make them database fields of Action model so you don't have to traverse every time you lookup?
    • i.e. do you do read operations much more than writes and
    • you can recalculate the calculated fields in a background job
  • are you accessing should_start and should_finish very often in a small time window?
  • how good are you with Neo4j :D
  • ....

EDIT 1

Only solution I see at the moment is to un-recursify the problem. Try this:

store in a string/text field the ids of the parent structure, for example

  • action 4 would have [1,2,3],
  • actions 2 & 3 would have [1] and
  • action 1 would have []

then when you map the ancestor_ids array into a hash of id => action

def ancestry_hash
  @ancestry_hash ||= Hash[ Action.includes(:action_parents).where(id: ancestor_ids).map{|action| [action.id, action]} ]
end

and then reimplement the recursive query to traverse this hash and not the activerecord tree or you will trigger additional queries. Something like:

def should_finish(id = self.id)
  should_start(id) + ancestry_hash[id].duration
end

def should_start(id = self.id)
  # my_start is a field in db: if there are no parents (root) it will use this field
  action = ancestry_hash[id]
  return my_start if action.action_parents.empty?
  action.action_parents.pluck(:parent_id).map{ |parent_id| should_finish(parent_id) }.sort.last
end

I didn't test the code, but I hope you get the idea, it should be close enough to this

like image 185
bbozo Avatar answered Nov 05 '22 08:11

bbozo


The problem:

In a nutshell, you have 2 issues. One is that you are actually attempting to preload more than you need (?!?), and the other is that Rails is not eager loading what you actually need because of the recursive nature of the logic.

To explain a bit further, consider this:

my_action.parents.map(&:parents).flatten.map(&:parents)

Rails will:

  1. first grab all parents for the given action
  2. then loop over each of those parents and grab their parents
  3. then flatten those 'grandparents' to an array, loop over each of them and fetch their parents

Note that in this case there isn't much point in eager loading the first-level parents since you are only starting with an instance of an Action - not a collection. Calling .parents will fetch all first-level parents for that action in one pass (which is what eager loading would have done anyhow).

So what happens when you are starting with a collection (ActiveRelation) instead of an instance?

Action.some_scope.includes(:parents).map(&:parents)

In this case, the parents of ALL actions included in the scope will be eager loaded. Calling .map(&:parents) will NOT trigger any further SQL calls, and this is the whole point of eager loading with includes(). However, there are 2 things that sort of defeat the whole purpose of this - and you are doing both of them :/

Firstly, your starting point isn't actually a collection of actions, as you are immediately calling .last. As such the fetching of all parents for ALL actions is meaningless - you only needed the 'last' one! Because of this, Rails is smart enough to scope down, and will only eager load the parents of the 'last' action. However, there wasn't much benefit to eager loading in this case, since calling .parents would have resulted in the same single query at a later time. (While there is a small benefit to loading in advance if later operations need to occur faster, that has limited utility in this case). So with or without the .includes statement, you would only have performed a single query to fetch parents for the 'last' action.

More importantly, you are recursively calling .parents on each of these parents, and Rails had absolutely no idea that you were going to do this. Moreover, recursive calls are by nature not pre-fetchable (without knowing a few tricks), so there really isn't any way to tell ActiveRecord, or using 'vanilla' SQL for that matter, to walk the chain and figure out which parents are needed until after it's already done that (making the point moot). All this leads to a nightmare of an N+1 situation, as you are experiencing.

Some solutions:

There are several ways to mitigate or eliminate the N+1 problem - in order of implementation complexity:

  • Prefetch up to N levels of parents (assumes you know what max(N) is)

Action.last.parents.includes(parents: {parents: :parents}) # grabs up to 4 levels

  • Skip SQL entirely, load all actions into a hash of Action arrays keyed by associated child_id and use non-ActiveRecord methods to aggregate what you need using simple Ruby. This will deteriorate quickly as your data grows, but it might be good enough for you - at least for the moment.

  • Use a schema that would let you determine the ancestry tree in advance, and use that utility to help with this calculation. The example given by @bbozo is one way to do this - you can also explore such gems as ancestry, acts_as_tree, awesome_nested_set, closure_tree and possibly others to help you out with this.

  • Use a database specific function that actually does the recursive calculation in a single call. PostgreSQL, Oracle, and MS-SQL have this capability, while MySQL and SQLite do not. This will probably get you the best performance, but can be complex to write using just the ActiveRecord query interface.

like image 25
PinnyM Avatar answered Nov 05 '22 08:11

PinnyM