Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django: aggregate returns a wrong result after using annotate

When aggregating a queryset, I noticed that if I use an annotation before, I get a wrong result. I can't understand why.

The Code

from django.db.models import QuerySet, Max, F, ExpressionWrapper, DecimalField, Sum
from orders.models import OrderOperation

class OrderOperationQuerySet(QuerySet):
    def last_only(self) -> QuerySet:
        return self \
            .annotate(last_oo_pk=Max('order__orderoperation__pk')) \
            .filter(pk=F('last_oo_pk'))

    @staticmethod
    def _hist_price(orderable_field):
        return ExpressionWrapper(
            F(f'{orderable_field}__hist_unit_price') * F(f'{orderable_field}__quantity'),
            output_field=DecimalField())

    def ordered_articles_data(self):
        return self.aggregate(
            sum_ordered_articles_amounts=Sum(self._hist_price('orderedarticle')))

The Test

qs1 = OrderOperation.objects.filter(order__pk=31655)
qs2 = OrderOperation.objects.filter(order__pk=31655).last_only()
assert qs1.count() == qs2.count() == 1 and qs1[0] == qs2[0]  # shows that both querysets contains the same object

qs1.ordered_articles_data()
> {'sum_ordered_articles_amounts': Decimal('3.72')}  # expected result

qs2.ordered_articles_data()
> {'sum_ordered_articles_amounts': Decimal('3.01')}  # wrong result

How is it possible that this last_only annotation method can make the aggregation result different (and wrong)?

The "funny" thing is that is seems to happen only when the order contains articles that have the same hist_price: enter image description here

Side note

  • I can confirm that the SQL created by Django ORM is probably wrong, because when I force execution of last_only() and then I call aggregation in a second query, it works as expected.
  • https://docs.djangoproject.com/en/1.11/topics/db/aggregation/#combining-multiple-aggregations could be an explanation?

SQL Queries (note that these are the actual queries but the code above has been slightly simplified, which explains the presence below of COALESCE and "deleted" IS NULL.)

-- qs1.ordered_articles_data()

SELECT
    COALESCE(
        SUM(
            ("orders_orderedarticle"."hist_unit_price" * "orders_orderedarticle"."quantity")
        ),
        0) AS "sum_ordered_articles_amounts"
FROM "orders_orderoperation"
    LEFT OUTER JOIN "orders_orderedarticle"
        ON ("orders_orderoperation"."id" = "orders_orderedarticle"."order_operation_id")
WHERE ("orders_orderoperation"."order_id" = 31655 AND "orders_orderoperation"."deleted" IS NULL)

-- qs2.ordered_articles_data()

SELECT COALESCE(SUM(("__col1" * "__col2")), 0)
FROM (
    SELECT
        "orders_orderoperation"."id" AS Col1,
        MAX(T3."id") AS "last_oo_pk",
        "orders_orderedarticle"."hist_unit_price" AS "__col1",
        "orders_orderedarticle"."quantity" AS "__col2"
    FROM "orders_orderoperation" INNER JOIN "orders_order"
        ON ("orders_orderoperation"."order_id" = "orders_order"."id")
        LEFT OUTER JOIN "orders_orderoperation" T3
            ON ("orders_order"."id" = T3."order_id")
        LEFT OUTER JOIN "orders_orderedarticle"
            ON ("orders_orderoperation"."id" = "orders_orderedarticle"."order_operation_id")
    WHERE ("orders_orderoperation"."order_id" = 31655 AND "orders_orderoperation"."deleted" IS NULL)
    GROUP BY
        "orders_orderoperation"."id",
        "orders_orderedarticle"."hist_unit_price",
        "orders_orderedarticle"."quantity"
    HAVING "orders_orderoperation"."id" = (MAX(T3."id"))
) subquery
like image 643
David D. Avatar asked Nov 06 '22 20:11

David D.


1 Answers

When you use any annotation in the database language(Aggregate Functions) you should to do group by all fields not inside the function, and you can see it inside the subquery

GROUP BY
    "orders_orderoperation"."id",
    "orders_orderedarticle"."hist_unit_price",
    "orders_orderedarticle"."quantity"
HAVING "orders_orderoperation"."id" = (MAX(T3."id"))

As result the goods with the same hist_unit_price and quantity is filtered by max id. So, based on your screen, one of the chocolate or cafe is excluded by the having condition.

like image 147
Brown Bear Avatar answered Nov 13 '22 18:11

Brown Bear