When aggregating a queryset, I noticed that if I use an annotation before, I get a wrong result. I can't understand why.
The Code
from django.db.models import QuerySet, Max, F, ExpressionWrapper, DecimalField, Sum
from orders.models import OrderOperation
class OrderOperationQuerySet(QuerySet):
def last_only(self) -> QuerySet:
return self \
.annotate(last_oo_pk=Max('order__orderoperation__pk')) \
.filter(pk=F('last_oo_pk'))
@staticmethod
def _hist_price(orderable_field):
return ExpressionWrapper(
F(f'{orderable_field}__hist_unit_price') * F(f'{orderable_field}__quantity'),
output_field=DecimalField())
def ordered_articles_data(self):
return self.aggregate(
sum_ordered_articles_amounts=Sum(self._hist_price('orderedarticle')))
The Test
qs1 = OrderOperation.objects.filter(order__pk=31655)
qs2 = OrderOperation.objects.filter(order__pk=31655).last_only()
assert qs1.count() == qs2.count() == 1 and qs1[0] == qs2[0] # shows that both querysets contains the same object
qs1.ordered_articles_data()
> {'sum_ordered_articles_amounts': Decimal('3.72')} # expected result
qs2.ordered_articles_data()
> {'sum_ordered_articles_amounts': Decimal('3.01')} # wrong result
How is it possible that this last_only
annotation method can make the aggregation result different (and wrong)?
The "funny" thing is that is seems to happen only when the order contains articles that have the same hist_price
:
Side note
last_only()
and then I call aggregation in a second query, it works as expected.SQL Queries
(note that these are the actual queries but the code above has been slightly simplified, which explains the presence below of COALESCE
and "deleted" IS NULL
.)
-- qs1.ordered_articles_data()
SELECT
COALESCE(
SUM(
("orders_orderedarticle"."hist_unit_price" * "orders_orderedarticle"."quantity")
),
0) AS "sum_ordered_articles_amounts"
FROM "orders_orderoperation"
LEFT OUTER JOIN "orders_orderedarticle"
ON ("orders_orderoperation"."id" = "orders_orderedarticle"."order_operation_id")
WHERE ("orders_orderoperation"."order_id" = 31655 AND "orders_orderoperation"."deleted" IS NULL)
-- qs2.ordered_articles_data()
SELECT COALESCE(SUM(("__col1" * "__col2")), 0)
FROM (
SELECT
"orders_orderoperation"."id" AS Col1,
MAX(T3."id") AS "last_oo_pk",
"orders_orderedarticle"."hist_unit_price" AS "__col1",
"orders_orderedarticle"."quantity" AS "__col2"
FROM "orders_orderoperation" INNER JOIN "orders_order"
ON ("orders_orderoperation"."order_id" = "orders_order"."id")
LEFT OUTER JOIN "orders_orderoperation" T3
ON ("orders_order"."id" = T3."order_id")
LEFT OUTER JOIN "orders_orderedarticle"
ON ("orders_orderoperation"."id" = "orders_orderedarticle"."order_operation_id")
WHERE ("orders_orderoperation"."order_id" = 31655 AND "orders_orderoperation"."deleted" IS NULL)
GROUP BY
"orders_orderoperation"."id",
"orders_orderedarticle"."hist_unit_price",
"orders_orderedarticle"."quantity"
HAVING "orders_orderoperation"."id" = (MAX(T3."id"))
) subquery
When you use any annotation
in the database language(Aggregate Functions) you should to do group by all fields not inside the function, and you can see it inside the subquery
GROUP BY
"orders_orderoperation"."id",
"orders_orderedarticle"."hist_unit_price",
"orders_orderedarticle"."quantity"
HAVING "orders_orderoperation"."id" = (MAX(T3."id"))
As result the goods with the same hist_unit_price
and quantity
is filtered by max id
. So, based on your screen, one of the chocolate
or cafe
is excluded by the having condition.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With