Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django filter, paginate and annotate paginated results

I have object Reports and ReportSubscriber and I want to count number of subscribers of a Report.

One solution is annotating. I have lots of reports so annotating all of them takes ~6 seconds, so I thought maybe it's better to annotate after paginating:

filter_search = ReportFilter(request.GET, queryset=Report.objects.filter(
        created_at__gt=start_date,
        created_at__lte=end_date,
        is_confirmed__exact=True,
    ).annotate(sub_count=Count("reportsubscriber")).order_by('-sub_count'))

paginator = Paginator(filter_search, 20)

result = paginator.page(1).object_list.annotate(
                sub_count=Count("reportsubscriber"))

It worked, but it took the same time and when I checked queries, it actually still went through all rows in report_subscriber table. So I tried using .extra()

filter_search = ReportFilter(request.GET, queryset=Report.objects.filter(
            created_at__gt=start_date,
            created_at__lte=end_date,
            is_confirmed__exact=True,
        ))

paginator = Paginator(filter_search, 20)
paged_reports = paginator.page(1)

result = filter_search.qs.extra(
            select={
                'sub_count': 'SELECT COUNT(*) FROM reports LEFT OUTER JOIN report_subscribers  \
                             ON (reports.id = report_subscribers.id) \
                             WHERE reports.id = report_subscribers.id \
                             AND report_subscribers.report_id IN %s \
                            ' % "(%s)" % ",".join([str(r.id) for r in paged_reports.object_list])
            },
            order_by=['sub_count']
        )

But this still didn't worked. I got one static number of subscribers for all reports. What am I missing, and maybe there are better ways to accomplish this? Thanks

like image 541
Mark Avatar asked Nov 13 '22 00:11

Mark


1 Answers

I can't give you a definitive answer, I believe your problem is that even when paginated, your entire query must be executed so that the paginator knows how many pages there are. I should think you'll be better off getting rid of the annotation before pagination:

filter_search = ReportFilter(request.GET, queryset=Report.objects.filter(
        created_at__gt=start_date,
        created_at__lte=end_date,
        is_confirmed__exact=True,
    ).order_by('-sub_count'))

paginator = Paginator(filter_search, 20)

result = paginator.page(1).object_list.annotate(
                sub_count=Count("reportsubscriber"))

I trust from your example that object_list is a queryset that you can annotate, but if it's just a list of objects, you can annotate each page of results with something like:

pageIds = [report.id for report in paginator.page(1).object_list]
result = Report.objects.filter(id__in=pageIds).annotate(
                sub_count=Count("reportsubscriber"))

But this is all shooting in the dark. Nothing you're doing looks too crazy, so unless your dataset is huge, I can only imagine that your problem is a poorly indexed query. You really will want to profile the actual query that's being generated. You can get the SQL by executing from your project Django shell for a given start_date and end_data:

Report.objects.filter(
        created_at__gt=start_date,
        created_at__lte=end_date,
        is_confirmed__exact=True,
    ).order_by('-sub_count').query

And then run the same query from the PSQL command line on your database using EXPLAIN. You'll have to do a bit of reading to figure out how to interpret the results.

like image 105
acjay Avatar answered Nov 14 '22 22:11

acjay