Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cumulative (running) sum with django orm and postgresql

Is it possible to calculate the cumulative (running) sum using django's orm? Consider the following model:

class AModel(models.Model):
    a_number = models.IntegerField()

with a set of data where a_number = 1. Such that I have a number ( >1 ) of AModel instances in the database all with a_number=1. I'd like to be able to return the following:

AModel.objects.annotate(cumsum=??).values('id', 'cumsum').order_by('id')
>>> ({id: 1, cumsum: 1}, {id: 2, cumsum: 2}, ... {id: N, cumsum: N})

Ideally I'd like to be able to limit/filter the cumulative sum. So in the above case I'd like to limit the result to cumsum <= 2

I believe that in postgresql one can achieve a cumulative sum using window functions. How is this translated to the ORM?

like image 616
wrdeman Avatar asked Apr 20 '17 11:04

wrdeman


People also ask

What is the use of cumulative sum?

Cumulative sums, or running totals, are used to display the total sum of data as it grows with time (or any other series or progression). This lets you view the total contribution so far of a given measure against time.

What is annotate in Django?

Appending the annotate() clause onto a QuerySet lets you add an attribute to each item in the QuerySet, like if you wanted to count the amount of articles in each category. However, sometimes you only want to count objects that match a certain condition, for example only counting articles that are published.


3 Answers

For reference, starting with Django 2.0 it is possible to use the Window function to achieve this result:

AModel.objects.annotate(cumsum=Window(Sum('a_number'), order_by=F('id').asc()))\
              .values('id', 'cumsum').order_by('id', 'cumsum')
like image 63
Campi Avatar answered Sep 21 '22 23:09

Campi


From Dima Kudosh's answer and based on https://stackoverflow.com/a/5700744/2240489 I had to do the following: I removed the reference to PARTITION BY in the sql and replaced with ORDER BY resulting in.

AModel.objects.annotate(
    cumsum=Func(
        Sum('a_number'), 
        template='%(expressions)s OVER (ORDER BY %(order_by)s)', 
        order_by="id"
    ) 
).values('id', 'cumsum').order_by('id', 'cumsum')

This gives the following sql:

SELECT "amodel"."id",
SUM("amodel"."a_number") 
OVER (ORDER BY id) AS "cumsum" 
FROM "amodel" 
GROUP BY "amodel"."id" 
ORDER BY "amodel"."id" ASC, "cumsum" ASC

Dima Kudosh's answer was not summing the results but the above does.

like image 27
wrdeman Avatar answered Sep 24 '22 23:09

wrdeman


For posterity, I found this to be a good solution for me. I didn't need the result to be a QuerySet, so I could afford to do this, since I was just going to plot the data using D3.js:

import numpy as np
import datettime

today = datetime.datetime.date()

raw_data = MyModel.objects.filter('date'=today).values_list('a_number', flat=True)

cumsum = np.cumsum(raw_data)
like image 34
Scott Skiles Avatar answered Sep 21 '22 23:09

Scott Skiles