Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django prefetch related no duplicates with intermediate table

I have a question I am trying to solve for one day now.

With the models

class Quote(models.Model):
    text = models.TextField()
    source = models.ForeignKey(Source)
    tags = models.ManyToManyField(Tag)
    ...

class Source(models.Model):
    title = models.CharField(max_length=100)
    ...

class Tag(models.Model):
    name = models.CharField(max_length=30,unique=True)
    slug = models.SlugField(max_length=40,unique=True)
    ...

I am trying to model the world of quotes. with relationships: one Source having many Quotes, one Quote having many Tags. Problem is:

  1. How do I get all Tags that are contained in a Source (through the contained Quotes) ?
  2. with the minimum possible queries.
  3. with the amount of times they are contained in that source

I have tried the naive one with no prefetch related, with a model method

def source_tags(self):
    tags = Tag.objects.filter(quote__source__id=self.id).distinct().annotate(usage_count=Count('quote'))
    return sorted(tags, key=lambda tag:-tag.usage_count)

And in the template:

{% for tag in source.source_tags|slice:":5" %}
    source.quote
{% endfor %}

Now I have

sources = Source.objects.all().prefetch_related('quote_set__tags')

And in the template I have no idea how to iterate correctly to get the Tags for one source, and how I would go about counting them instead of listing duplicate tags.

like image 859
niklas Avatar asked Oct 21 '22 09:10

niklas


1 Answers

This will get the result in a single SQL query:

# views.py
from django.db.models import Count
from .models import Source


def get_tag_count():
    """
    Returns the count of tags associated with each source
    """
    sources = Source.objects.annotate(tag_count=Count('quote__tags')) \
                         .values('title', 'quote__tags__name', 'tag_count') \
                         .order_by('title')
    # Groupe the results as
    # {source: {tag: count}}
    grouped = {}
    for source in sources:
        title = source['title']
        tag = source['quote__tags__name']
        count = source['tag_count']
        if not title in grouped:
            grouped[title] = {}
        grouped[title][tag] = count
    return grouped



# in template.html

{% for source, tags in sources.items %}

    <h3>{{ source }}</h3>

    {% for tag, count in tags.items %}
        {% if tag %}
            <p>{{ tag }} : {{ count }}</p>
        {% endif %}
    {% endfor %}

{% endfor %}

Complementary tests :)

# tests.py
from django.test import TestCase
from .models import Source, Tag, Quote
from .views import get_tag_count


class SourceTags(TestCase):

    def setUp(self):
        abc = Source.objects.create(title='ABC')
        xyz = Source.objects.create(title='XYZ')

        inspire = Tag.objects.create(name='Inspire', slug='inspire')
        lol = Tag.objects.create(name='lol', slug='lol')

        q1 = Quote.objects.create(text='I am inspired foo', source=abc)
        q2 = Quote.objects.create(text='I am inspired bar', source=abc)
        q3 = Quote.objects.create(text='I am lol bar', source=abc)
        q1.tags = [inspire]
        q2.tags = [inspire]
        q3.tags = [inspire, lol]
        q1.save(), q2.save(), q3.save()

    def test_count(self):
        # Ensure that only 1 SQL query is done
        with self.assertNumQueries(1):
            sources = get_tag_count()
            self.assertEqual(sources['ABC']['Inspire'], 3)
            self.assertEqual(sources['ABC']['lol'], 1)

I have basically used the annotate and values functions from the ORM. They are very powerful because they automatically perform the joins. They are also very efficient because they hit the database only once, and return only those fields which are specified.

like image 70
Pratyush Avatar answered Oct 27 '22 10:10

Pratyush