Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

efficient function to retrieve a queryset of ancestors of an mptt queryset

Does anybody have an efficient algorithm to retrieve all ancestors of an mptt queryset? The best I could think of so far is something like this:

def qs_ancestors(queryset):
    if isinstance(queryset, EmptyQuerySet):
        return queryset
    queryset_aggs = queryset.values_list('tree_id', 'level').annotate(max_lft=Max('lft'), min_rght=Min('rght'))
    new_queryset = queryset.none()
    for tree_id, level, max_lft, min_rght in queryset_aggs:
        ancestors = MyModel.objects.filter(
           tree_id=tree_id,
           level__lt=level, 
           lft__lte=max_lft,
           rght__gte=min_rght,
        )
        new_queryset = ancestors | new_queryset
    return new_queryset

There are two problems with this approach:

  1. It fails if there are branches that aren't next to each other (ie it doesn't really work)
  2. It is highly inefficient because it ends up have number_of_trees*number_of_levels clauses in the final query, which can get very large very fast

I am open to caching the ancestors somewhere else, but I cannot think of a way to do efficiently. I considered adding a field with a comma separated list of ancestor's ids and then doing a GROUP_CONCAT (I am in MySQL) inside an extra, but I think that could get huge/slow.

like image 921
Bacon Avatar asked Jun 24 '11 17:06

Bacon


2 Answers

I had to write a similar algorithm once. I had a view displaying a MPTT tree, it's was a very large tree so I couldn't load all of it's data in the HTML template. So I displayed only the root nodes in the initial load and used Ajax to load the other nodes.

It was working good until my boss asked me to implement a 'search' option. The search had to look in all nodes and explode the tree were it found a match. It took me a while to figure this out, but I finally got it. Here is the solution a came up with:

from django.db.models import Q

def get_parents(self, qs):
    tree_list = {}
    query = Q()
    for node in qs:
        if node.tree_id not in tree_list:
            tree_list[node.tree_id] = []

        parent =  node.parent.pk if node.parent is not None else None,

        if parent not in tree_list[node.tree_id]:
            tree_list[node.tree_id].append(parent)

            query |= Q(lft__lt=node.lft, rght__gt=node.rght, tree_id=node.tree_id)

    return YourModel.objects.filter(query)

It needs only two queries to run, the initial qs passed as an argument and the final queryset returned by the function. The tree_list is a dictionary that stores nodes that were already added to the queryset, it's an optimization and it's not necessary for the algorithm to work. But since I was working with a relative big tree I had to include it.

I guess you could turn this method into a manager and make it more generic, i.e. make it work for any MPTT model and not only YourModel

like image 190
Cesar Canassa Avatar answered Oct 19 '22 11:10

Cesar Canassa


How about:

def qs_ancestors(queryset):
    if isinstance(queryset, EmptyQuerySet):
        return queryset
    new_queryset = queryset.none()
    for obj in queryset:
        new_queryset = new_queryset | obj.get_ancestors()
return new_queryset

It's still len(queryset) clauses. You could potentially reduce the number of clauses a bit by doing a preprocess pass that removes objects that are ancestors of other objects in the queryset, something like:

min_obj_set = []
for obj in queryset.order_by('tree_id', '-level'):
    for obj2 in min_obj_set:
        if obj.is_ancestor_of(obj2):
            break
    else:
        min_obj_set.append(obj)

Although the above snippet is only an example, you'll probably want to use a BST if your querset contains a significant amount of objects.

You'll have to test if this yeilds an increase in speed vs. the larger DB query, though.

like image 26
Michał Modzelewski Avatar answered Oct 19 '22 09:10

Michał Modzelewski