I have the following Django model:
class MarketData(models.Model):
category = models.CharField(max_length=64, null=True)
brand = models.CharField(max_length=128, null=True)
time_period = models.DateField()
sales = models.IntegerField(default=0, null=True)
I am trying to use Django's ORM to calculate the market share of each brand within each time period of my dataset to output to a line chart. The market share calculation itself is simple: the sales of each brand within the time period divided by the total sales in each time period.
I can use Django's ORM to generate a queryset of brands and sales by time period, but I'm struggling to annotate the additional metric of sales for all brands.
My current query:
sales = MarketData.objects.filter(brand__in=['Brand 1', 'Brand 2']) \
.values('time_period', 'brand') \
.annotate(sales=Sum('sales')) \
.order_by('period', '-sales')
My current queryset output:
<QuerySet [{'time_period': datetime.date(2020, 6, 30), 'brand': "Brand 1", 'sales': 21734}, {'time_period': datetime.date(2020, 6, 30), 'brand': 'Brand 2', 'sales': 93622}]>
How can I add another annotation that includes the sales of ALL brands for that time period, not just the brands specified in the filter, and not just the current brand?
I have tried to use a Subquery to get the total market sales by period, but I end up with an aggregate across all periods, or an ungodly-long SQL statement that takes 13 seconds to run. Roughly, the SQL statement that I want is the following, which takes 0.2 seconds:
SELECT
brand
b.time_period,
SUM(sales) / market_total AS share
FROM market_data AS b
JOIN (
-- Create sub-query of total market sales for each period
SELECT
time_period,
SUM(sales) AS market_total
FROM marketdata
GROUP BY
time_period
) AS m ON m.time_period = b.time_period
WHERE
brand IN ('Brand 1', 'Brand 2')
AND b.time_period BETWEEN '2020-01-01' AND '2023-03-31'
GROUP BY
brand,
b.time_period,
market_total
ORDER BY
brand,
b.time_period
but I can't figure out a way to generate this using Django's ORM. I would like to use the ORM instead of raw SQL because I have some additional WHERE clauses that I want to include conditionally, and it would be significantly easier to do the filtering using the ORM within 'if' statements.
I believe you can achieve that with a subquery:
subquery = MarketData.objects.annotate(total_sales=Sum('sales')).values("total_sales")[:1]
sales = MarketData.objects.filter(brand__in=['Brand 1', 'Brand 2']) \
.values('time_period', 'brand') \
.annotate(
sales=Sum('sales'),
total_brands_sales=Subquery(subquery)
)\
.order_by('period', '-sales')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With