Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django Window annotation using combined with distinct clause

I have a Django model stored in a Postgres DB comprised of values of counts at irregular intervals:

WidgetCount
 - Time
 - Count

I'm trying to use a window function with Lag to give me a previous row's values as an annotation. My problem is when I try to combine it with some distinct date truncation the window function uses the source rows rather than the distinctly grouped ones.

For example if I have the following rows:

time                count
2020-01-20 05:00    15
2020-01-20 06:00    20
2020-01-20 09:00    30
2020-01-21 06:00    35
2020-01-21 07:00    40
2020-01-22 04:00    50
2020-01-22 06:00    54
2020-01-22 09:00    58

And I want to return a queryset showing the first reading per day, I can use:

from django.db.models.functions import Trunc

WidgetCount.objects.distinct("date").annotate(date=Trunc("time", "day"))

Which gives me:

date        count
01/01/20    15
01/01/21    35
01/01/22    50

I would like to add an annotation which gives me yesterday's value (so I can show the change per day).

date        count   yesterday_count
01/01/20    15
01/01/21    35      15
01/01/22    50      35

If I do:

from django.db.models.functions import Trunc, Lag
from django.db.models import Window

WidgetCount.objects.distinct("date").annotate(date=Trunc("time", "day"), yesterday_count=Window(expression=Lag("count")))

The second row return gives me 30 for yesterday_count - ie, its showing me the previous row before applying the distinct clause.

If I add a partiion clause like this:

WidgetCount.objects.distinct("date").annotate(date=Trunc("time", "day"), yesterday_count=Window(expression=Lag("count"), partition_by=F("date")))

Then yesterday_count is None for all rows.

I can do this calculation in Python if I need to but it's driving me a bit mad and I'd like to find out if what I'm trying to do is possible.

Thanks!

like image 967
Ludo Avatar asked Apr 14 '20 21:04

Ludo


1 Answers

I think the main problem is that you're mixing operations that used in annotation generates a grouped query set such as sum with a operation that simples create a new field for each record in the given query set such as yesterday_count=Window(expression=Lag("count")).

So Ordering really matters here. So when you try:

WidgetCount.objects.distinct("date").annotate(date=Trunc("time", "day"), yesterday_count=Window(expression=Lag("count")))

The result queryset is simply the WidgetCount.objects.distinct("date") annotated, no grouping is perfomed.

I would suggest decoupling your operations so it becomes easier to understand what is happening, and notice you're iterating over the python object so don't need to make any new queries!

Note in using SUM operation as example because I am getting an unexpected error with the FirstValue operator. So I'm posting with Sum to demonstrate the idea which remains the same. The idea should be the same for first value just by changing acc_count=Sum("count") to first_count=FirstValue("count")

for truncDate_groups in Row.objects.annotate(trunc_date=Trunc('time','day')).values("trunc_date")\
                      .annotate(acc_count=Sum("count")).values("acc_count","trunc_date")\
                      .order_by('trunc_date')\
                      .annotate(y_count=Window(Lag("acc_count")))\
                      .values("trunc_date","acc_count","y_count"):
    print(truncDate_groups)

OUTPUT:

{'trunc_date': datetime.datetime(2020, 1, 20, 0, 0, tzinfo=<UTC>), 'acc_count': 65, 'y_count': None}
{'trunc_date': datetime.datetime(2020, 1, 21, 0, 0, tzinfo=<UTC>), 'acc_count': 75, 'y_count': 162}
{'trunc_date': datetime.datetime(2020, 1, 22, 0, 0, tzinfo=<UTC>), 'acc_count': 162, 'y_count': 65}

It turns out FirstValue operator requires to use a Windows function so you can't nest FirtValue and then calculate Lag, so in this scenario I'm not exactly sure if you can do it. The question becomes how to access the First_Value column without nesting windows.

like image 170
Bernardo stearns reisen Avatar answered Nov 06 '22 04:11

Bernardo stearns reisen