Does anybody have an efficient algorithm to retrieve all ancestors of an mptt queryset? The best I could think of so far is something like this:
def qs_ancestors(queryset):
if isinstance(queryset, EmptyQuerySet):
return queryset
queryset_aggs = queryset.values_list('tree_id', 'level').annotate(max_lft=Max('lft'), min_rght=Min('rght'))
new_queryset = queryset.none()
for tree_id, level, max_lft, min_rght in queryset_aggs:
ancestors = MyModel.objects.filter(
tree_id=tree_id,
level__lt=level,
lft__lte=max_lft,
rght__gte=min_rght,
)
new_queryset = ancestors | new_queryset
return new_queryset
There are two problems with this approach:
number_of_trees*number_of_levels
clauses in the final query, which can get very large very fastI am open to caching the ancestors somewhere else, but I cannot think of a way to do efficiently. I considered adding a field with a comma separated list of ancestor's ids and then doing a GROUP_CONCAT
(I am in MySQL) inside an extra, but I think that could get huge/slow.
I had to write a similar algorithm once. I had a view displaying a MPTT tree, it's was a very large tree so I couldn't load all of it's data in the HTML template. So I displayed only the root nodes in the initial load and used Ajax to load the other nodes.
It was working good until my boss asked me to implement a 'search' option. The search had to look in all nodes and explode the tree were it found a match. It took me a while to figure this out, but I finally got it. Here is the solution a came up with:
from django.db.models import Q
def get_parents(self, qs):
tree_list = {}
query = Q()
for node in qs:
if node.tree_id not in tree_list:
tree_list[node.tree_id] = []
parent = node.parent.pk if node.parent is not None else None,
if parent not in tree_list[node.tree_id]:
tree_list[node.tree_id].append(parent)
query |= Q(lft__lt=node.lft, rght__gt=node.rght, tree_id=node.tree_id)
return YourModel.objects.filter(query)
It needs only two queries to run, the initial qs
passed as an argument and the final queryset returned by the function. The tree_list
is a dictionary that stores nodes that were already added to the queryset, it's an optimization and it's not necessary for the algorithm to work. But since I was working with a relative big tree I had to include it.
I guess you could turn this method into a manager and make it more generic, i.e. make it work for any MPTT model and not only YourModel
How about:
def qs_ancestors(queryset):
if isinstance(queryset, EmptyQuerySet):
return queryset
new_queryset = queryset.none()
for obj in queryset:
new_queryset = new_queryset | obj.get_ancestors()
return new_queryset
It's still len(queryset) clauses. You could potentially reduce the number of clauses a bit by doing a preprocess pass that removes objects that are ancestors of other objects in the queryset, something like:
min_obj_set = []
for obj in queryset.order_by('tree_id', '-level'):
for obj2 in min_obj_set:
if obj.is_ancestor_of(obj2):
break
else:
min_obj_set.append(obj)
Although the above snippet is only an example, you'll probably want to use a BST if your querset contains a significant amount of objects.
You'll have to test if this yeilds an increase in speed vs. the larger DB query, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With