I'm building a small search system for a Django project (yes I know, there's already lots of products doing this but I'd like to try it, just for fun). I basically have the following models:
class Word(models.Model):
""" A searchable word.
We only store the slugified value
"""
slug = models.SlugField(unique = True)
class Searchable(models.Model):
""" Superclass for Searchable objects.
"""
words = models.ManyToManyField(
Word,
through='WordCount')
class WordCount(models.Model):
""" Occurences of a word in a Searchable object.
"""
word = models.ForeignKey(Word)
item = models.ForeignKey(Searchable)
count = models.IntegerField()
So for example, I create an object Page (subclassing Searchable) with the text "Hello StackOverflow, I have a Django question". The system will create a Word instance for each words in this sentence, and for each a WordCount instance saying that each word appears once in the text.
Making a query to get all Searchable instances containing one a more word works fine (searchable_text extract the words and makes a list from it):
def search(query)
tokens = searchable_text(query)
words = Word.objects.filter(
reduce(operator.or_,
[models.Q(slug__contains = t)
for t in tokens]))
return Searchable.objects.filter(words__in = words)
Now what I'd like to do is to use the intermediate relation to order the results. I'd like to keep a QuerySet so the following code will not work, but gives the idea of what I want to do (with ugly patching to make annotations):
def search(query)
tokens = searchable_text(query)
words = Word.objects.filter(
reduce(operator.or_,
[models.Q(slug__contains = t)
for t in tokens]))
results = []
for obj in Searchable.objects.filter(words__in = words):
matching_words = obj.wordcount_set.filter(word__in = words)
obj.weight = sum([w.count for w in matching_words])
results.append(obj)
return sorted(results,
reverse = True,
key = lambda x: x.weight)
So basically: - I get all Word objects contained in the query (or partially matching, if I search for "Stack", the Word "StackOverflow" will be taken into account) - I get all objects that have a relation with any of those words - for each objects, I select all related WordCount objects that are related with a Word in the list of Word previously computed, then do the sum of the 'count' attribute and store it as an annotation 'weight' - I sort my objects on the 'weight'
I don't know if that's doable with QuerySet but I'd like to keep the format for some extra actions after (like filtering out some results for example).
I know there lots of improvements possible but that would be a good start.
Thanks for the answers, Vincent
Try
Searchable.objects.filter(words__in=words).annotate(
weight=models.Sum('wordcount__count')).order_by('-weight')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With