I have a Django model stored in a Postgres DB comprised of values of counts at irregular intervals:
WidgetCount
- Time
- Count
I'm trying to use a window function with Lag to give me a previous row's values as an annotation. My problem is when I try to combine it with some distinct date truncation the window function uses the source rows rather than the distinctly grouped ones.
For example if I have the following rows:
time count
2020-01-20 05:00 15
2020-01-20 06:00 20
2020-01-20 09:00 30
2020-01-21 06:00 35
2020-01-21 07:00 40
2020-01-22 04:00 50
2020-01-22 06:00 54
2020-01-22 09:00 58
And I want to return a queryset showing the first reading per day, I can use:
from django.db.models.functions import Trunc
WidgetCount.objects.distinct("date").annotate(date=Trunc("time", "day"))
Which gives me:
date count
01/01/20 15
01/01/21 35
01/01/22 50
I would like to add an annotation which gives me yesterday's value (so I can show the change per day).
date count yesterday_count
01/01/20 15
01/01/21 35 15
01/01/22 50 35
If I do:
from django.db.models.functions import Trunc, Lag
from django.db.models import Window
WidgetCount.objects.distinct("date").annotate(date=Trunc("time", "day"), yesterday_count=Window(expression=Lag("count")))
The second row return gives me 30 for yesterday_count - ie, its showing me the previous row before applying the distinct clause.
If I add a partiion clause like this:
WidgetCount.objects.distinct("date").annotate(date=Trunc("time", "day"), yesterday_count=Window(expression=Lag("count"), partition_by=F("date")))
Then yesterday_count is None for all rows.
I can do this calculation in Python if I need to but it's driving me a bit mad and I'd like to find out if what I'm trying to do is possible.
Thanks!
I think the main problem is that you're mixing operations that used in annotation generates a grouped query set such as sum with a operation that simples create a new field for each record in the given query set such as yesterday_count=Window(expression=Lag("count"))
.
So Ordering really matters here. So when you try:
WidgetCount.objects.distinct("date").annotate(date=Trunc("time", "day"), yesterday_count=Window(expression=Lag("count")))
The result queryset is simply the WidgetCount.objects.distinct("date") annotated, no grouping is perfomed.
I would suggest decoupling your operations so it becomes easier to understand what is happening, and notice you're iterating over the python object so don't need to make any new queries!
Note in using SUM operation as example because I am getting an unexpected error with the FirstValue operator. So I'm posting with Sum to demonstrate the idea which remains the same. The idea should be the same for first value just by changing acc_count=Sum("count")
to first_count=FirstValue("count")
for truncDate_groups in Row.objects.annotate(trunc_date=Trunc('time','day')).values("trunc_date")\
.annotate(acc_count=Sum("count")).values("acc_count","trunc_date")\
.order_by('trunc_date')\
.annotate(y_count=Window(Lag("acc_count")))\
.values("trunc_date","acc_count","y_count"):
print(truncDate_groups)
OUTPUT:
{'trunc_date': datetime.datetime(2020, 1, 20, 0, 0, tzinfo=<UTC>), 'acc_count': 65, 'y_count': None}
{'trunc_date': datetime.datetime(2020, 1, 21, 0, 0, tzinfo=<UTC>), 'acc_count': 75, 'y_count': 162}
{'trunc_date': datetime.datetime(2020, 1, 22, 0, 0, tzinfo=<UTC>), 'acc_count': 162, 'y_count': 65}
It turns out FirstValue operator requires to use a Windows function so you can't nest FirtValue and then calculate Lag, so in this scenario I'm not exactly sure if you can do it. The question becomes how to access the First_Value column without nesting windows.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With