I've the following self referential association:
class Action < ActiveRecord::Base
# self referential association
has_many :action_parents
has_many :parents, through: :action_parents
has_many :action_children, class_name: 'ActionParent', foreign_key: 'parent_id'
has_many :children, through: :action_children, source: :action
…
def should_finish
should_start + duration
end
def should_start
# my_start is a field in db: if there are no parents (root) it will use this field
return my_start if parents.empty?
parents.map(&:should_finish).sort.last
end
end
My problem is the fact that should_finish
and should_start
are calling each other and even if I preload the parents it keeps resulting in many queries:
Action.includes(:parents).last.should_finish
# a new query every time it checks for parents
Any idea on how to cache actions
and parents
?
EDIT - let me give some context:
# actions table: actions_parents table:
# id | duration task_id | parent_id
# 1 | 5 2 | 1
# 2 | 10 3 | 1
# 3 | 20 4 | 2
# 4 | 15 4 | 3
#
# |--------------|
# | action 2 |
# |---------- >| duration: 10 |
# | |--------------|
# | |
# |--------------| |--------->|--------------|
# | action 1 | | action 4 |
# | duration: 5 | | duration: 15 |
# |--------------| |--------->|--------------|
# | |
# | |--------------|
# |----------->| action 3 |
# | duration: 20 |
# |--------------|
PS: there are no circular dependencies.
Assuming I have a tree my_start
field of some day at 10:00:00
:
# action | should_start | should_finish
# -------------------------------------
# 1 | 10:00:00* | 10:00:05
# 2 | 10:00:05 | 10:00:15
# 3 | 10:00:05 | 10:00:25
# 4 | 10:00:25** | 10:00:40
#
# * value from db since there is no parent
# ** should_finish of parent with latest should_finish (action 3)
I thought it could preload all actions using Action.includes(:parents)
I'll throw a wild one before I know the specifics,
Assuming there are no prominent loops in the parents structure you can not help yourself by caching anything short of caching the entire table because every time you hit parents you will hit different parents for every action instance and no caching strategy, including the rails one, will rescue you short of moving the entire dataset into the cache.
Thing is, what you seem to be trying to do is really hard to do with a relational database and seems exactly the reason why graph databases were invented (see What are graph databases & When to use a graph database & Neo4j on Heroku)
Best thing you can do short of going to a graph database or caching your entire actions table is to optimize the queries (use pluck
) and possibly rewrite them to a PLSQL function.
Plan B is to have your knowledge about your data come to your rescue,
should_start
, duration
and should_finish
change? Does it change a lot?Action
model so you don't have to traverse every time you lookup?
should_start
and should_finish
very often in a small time window?EDIT 1
Only solution I see at the moment is to un-recursify the problem. Try this:
store in a string/text field the ids of the parent structure, for example
[1,2,3]
, [1]
and []
then when you map the ancestor_ids
array into a hash of id => action
def ancestry_hash
@ancestry_hash ||= Hash[ Action.includes(:action_parents).where(id: ancestor_ids).map{|action| [action.id, action]} ]
end
and then reimplement the recursive query to traverse this hash and not the activerecord tree or you will trigger additional queries. Something like:
def should_finish(id = self.id)
should_start(id) + ancestry_hash[id].duration
end
def should_start(id = self.id)
# my_start is a field in db: if there are no parents (root) it will use this field
action = ancestry_hash[id]
return my_start if action.action_parents.empty?
action.action_parents.pluck(:parent_id).map{ |parent_id| should_finish(parent_id) }.sort.last
end
I didn't test the code, but I hope you get the idea, it should be close enough to this
The problem:
In a nutshell, you have 2 issues. One is that you are actually attempting to preload more than you need (?!?), and the other is that Rails is not eager loading what you actually need because of the recursive nature of the logic.
To explain a bit further, consider this:
my_action.parents.map(&:parents).flatten.map(&:parents)
Rails will:
Note that in this case there isn't much point in eager loading the first-level parents since you are only starting with an instance of an Action - not a collection. Calling .parents
will fetch all first-level parents for that action in one pass (which is what eager loading would have done anyhow).
So what happens when you are starting with a collection (ActiveRelation) instead of an instance?
Action.some_scope.includes(:parents).map(&:parents)
In this case, the parents of ALL actions included in the scope will be eager loaded. Calling .map(&:parents)
will NOT trigger any further SQL calls, and this is the whole point of eager loading with includes()
. However, there are 2 things that sort of defeat the whole purpose of this - and you are doing both of them :/
Firstly, your starting point isn't actually a collection of actions, as you are immediately calling .last
. As such the fetching of all parents for ALL actions is meaningless - you only needed the 'last' one! Because of this, Rails is smart enough to scope down, and will only eager load the parents of the 'last' action. However, there wasn't much benefit to eager loading in this case, since calling .parents
would have resulted in the same single query at a later time. (While there is a small benefit to loading in advance if later operations need to occur faster, that has limited utility in this case). So with or without the .includes
statement, you would only have performed a single query to fetch parents for the 'last' action.
More importantly, you are recursively calling .parents
on each of these parents, and Rails had absolutely no idea that you were going to do this. Moreover, recursive calls are by nature not pre-fetchable (without knowing a few tricks), so there really isn't any way to tell ActiveRecord, or using 'vanilla' SQL for that matter, to walk the chain and figure out which parents are needed until after it's already done that (making the point moot). All this leads to a nightmare of an N+1 situation, as you are experiencing.
Some solutions:
There are several ways to mitigate or eliminate the N+1 problem - in order of implementation complexity:
Action.last.parents.includes(parents: {parents: :parents}) # grabs up to 4 levels
Skip SQL entirely, load all actions into a hash of Action arrays keyed by associated child_id and use non-ActiveRecord methods to aggregate what you need using simple Ruby. This will deteriorate quickly as your data grows, but it might be good enough for you - at least for the moment.
Use a schema that would let you determine the ancestry tree in advance, and use that utility to help with this calculation. The example given by @bbozo is one way to do this - you can also explore such gems as ancestry, acts_as_tree, awesome_nested_set, closure_tree and possibly others to help you out with this.
Use a database specific function that actually does the recursive calculation in a single call. PostgreSQL, Oracle, and MS-SQL have this capability, while MySQL and SQLite do not. This will probably get you the best performance, but can be complex to write using just the ActiveRecord query interface.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With