Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django - Annotate on ManyToMany fields using an intermediate table

I'm building a small search system for a Django project (yes I know, there's already lots of products doing this but I'd like to try it, just for fun). I basically have the following models:

class Word(models.Model):
    """ A searchable word.
    We only store the slugified value
    """
    slug = models.SlugField(unique = True)

class Searchable(models.Model):
    """ Superclass for Searchable objects.
    """
    words = models.ManyToManyField(
        Word,
        through='WordCount')

class WordCount(models.Model):
   """ Occurences of a word in a Searchable object.
    """
    word = models.ForeignKey(Word)
    item = models.ForeignKey(Searchable)
    count = models.IntegerField()

So for example, I create an object Page (subclassing Searchable) with the text "Hello StackOverflow, I have a Django question". The system will create a Word instance for each words in this sentence, and for each a WordCount instance saying that each word appears once in the text.

Making a query to get all Searchable instances containing one a more word works fine (searchable_text extract the words and makes a list from it):

def search(query)
    tokens = searchable_text(query)
    words = Word.objects.filter(
                        reduce(operator.or_,
                               [models.Q(slug__contains = t)
                                for t in tokens]))

    return Searchable.objects.filter(words__in = words)

Now what I'd like to do is to use the intermediate relation to order the results. I'd like to keep a QuerySet so the following code will not work, but gives the idea of what I want to do (with ugly patching to make annotations):

def search(query)
    tokens = searchable_text(query)
    words = Word.objects.filter(
                        reduce(operator.or_,
                               [models.Q(slug__contains = t)
                                for t in tokens]))
    results = []
    for obj in Searchable.objects.filter(words__in = words):
        matching_words = obj.wordcount_set.filter(word__in = words)
        obj.weight = sum([w.count for w in matching_words])
        results.append(obj)

    return sorted(results,
                  reverse = True,
                  key = lambda x: x.weight)

So basically: - I get all Word objects contained in the query (or partially matching, if I search for "Stack", the Word "StackOverflow" will be taken into account) - I get all objects that have a relation with any of those words - for each objects, I select all related WordCount objects that are related with a Word in the list of Word previously computed, then do the sum of the 'count' attribute and store it as an annotation 'weight' - I sort my objects on the 'weight'

I don't know if that's doable with QuerySet but I'd like to keep the format for some extra actions after (like filtering out some results for example).

I know there lots of improvements possible but that would be a good start.

Thanks for the answers, Vincent

like image 861
Vincent Avatar asked Nov 05 '22 01:11

Vincent


1 Answers

Try

Searchable.objects.filter(words__in=words).annotate(
    weight=models.Sum('wordcount__count')).order_by('-weight')
like image 68
okm Avatar answered Nov 14 '22 21:11

okm