Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Maximum of an annotation after a group by

Tags:

python

django

I would like to compute the maximum of a_priority for each group of (b, c) pairs.

a_priority is an annotation based on a case/when mapping strings to priority values.

from django.db.models import Max, Case, When, IntegerField
qs = MyObject.objects.all()
qs = qs.annotate(
    a_priority=Case(
        When(a='A', then=1), 
        When(a='S', then=2),
        When(a='Q', then=3),        
        output_field=IntegerField()
    )
)
qs = qs.values("b", "c").annotate(Max("a_priority"))

I get the following error:

KeyError: 'a_priority'

I believe the qs.values("b", "c") filters out my annotation a_priority. Behavior is different with any actual field, providing the max of the field.

My django version is 1.10 on python 3.

like image 550
Omar Avatar asked Oct 29 '16 10:10

Omar


2 Answers

Have you tried putting Case expression directly into Max? It is possible since Django 1.8.

from django.db.models import Max, Case, When, IntegerField
qs = MyObject.objects.all()
a_priority=Case(
    When(a='A', then=1), 
    When(a='S', then=2),
    When(a='Q', then=3),        
    output_field=IntegerField()
)
qs = qs.values("b", "c").annotate(max_a_priority=Max(a_priority))
like image 130
Piotr Ćwiek Avatar answered Nov 19 '22 13:11

Piotr Ćwiek


I believe the qs.values("b", "c") filters out my annotation a_priority.

You are correct! values constrain the columns that are returned in the resulting QuerySet. But since the QuerySet is linked to the original model, you can filter using columns not provided in values (ex: qs.values('b', 'c').filter('a'='A')).

The best way to accomplish your task is Piotr Ćwiek's answer. But to make the answers more comprehensive (and to get to know aggregations better):

  • explicitly include a_priority in your values:

    qs1 = qs.values("b", "c", "a_priority").annotate(max_priority=Max("a_priority")).distinct()
    # Or
    qs2 = qs.values("b", "c", "a_priority").annotate(max_priority=Max("a_priority")).filter(a_priority=F('max_a_priority'))
    
  • reverse the order of values and annotate:

    qs3 = qs.annotate(max_priority=Max("a_priority")).values("b", "c").distinct()
    

Django docs explain why distinct (or filter) is required here, and not in Piotr's answer:

If the values() clause precedes the annotate(), the annotation will be computed using the grouping described by the values() clause.

So a_priority is also included in the GROUP BY clause in qs1 and qs2, hence distinct is required.

However, if the annotate() clause precedes the values() clause, the annotations will be generated over the entire query set. In this case, the values() clause only constrains the fields that are generated on output.

This explains qs3. Internally, Django does GROUP BY by id to get results over the entire QuerySet.

like image 34
jatinderjit Avatar answered Nov 19 '22 11:11

jatinderjit