Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

django aggregation to lower resolution using grouping by a date range

horrible title, but let me explain: i've got this django model containing a timestamp (date) and the attribute to log - f.e. the number of users consuming some ressource - (value).

class Viewers(models.Model):
    date = models.DateTimeField()
    value = models.IntegerField()

for each 10seconds the table contains the number of users. something like this:

| date | value |
|------|-------|
|  t1  |   15  |
|  t2  |   18  |
|  t3  |   27  |
|  t4  |   25  |
|  ..  |   ..  |
|  t30 |   38  |
|  t31 |   36  |
|  ..  |   ..  |

now i want to generate different statistics from this data, each with another resolution. f.e. for a chart of the last day i don't need the 10second resolution, so i want 5 minute steps (that are build by averaging the values (and maybe also the date) of the rows from t1 to t29, t30 to t59, ...), so that i'll get:

| date | value |
|------|-------|
|  t15 |   21  |
|  t45 |   32  |
|  ..  |   ..  |

the attributes to keep variable are start & end timestamp and the resolution (like 5 minutes). is there a way using the django orm/queryset api and if not, how to reach this with custom sql?

like image 793
kraiz Avatar asked Jun 07 '11 14:06

kraiz


3 Answers

I've been trying to solve this problem in the most 'django' way possible. I've settled for the following. It averages the values for 15minute time slots between start_date and end_date where the column name is'date':

readings = Reading.objects.filter(date__range=(start_date, end_date)) \
   .extra(select={'date_slice': "FLOOR (EXTRACT (EPOCH FROM date) / '900' )"}) \
   .values('date_slice') \
   .annotate(value_avg=Avg('value'))

It returns a dictionary:

 {'value_avg': 1116.4925373134329, 'date_slice': 1546512.0}
 {'value_avg': 1001.2028985507246, 'date_slice': 1546513.0}
 {'value_avg': 1180.6285714285714, 'date_slice': 1546514.0}

The core of the idea comes from this answer to the same question for PHP/SQL. The code passed to extra is for a Postgres DB.

like image 111
Richard Corden Avatar answered Nov 07 '22 00:11

Richard Corden


from django.db.models import Avg

Viewers.objects.filter(date__range=(start_time, end_time)).aggregate(average=Avg('value'))

That will get you the average of all the values between start_time and end_time, returned as a dictionary in the form of { 'average': <the average> }.

start_time and end_time need to be Python datetime objects. So if you have a timestamp, or something, you'll need to convert it first. You can also use datetime.timedelta to calculate the end_time based on the start_time. For a five minute resolution, something like this:

from datetime import timedelta

end_time = start_time + timedelta(minutes=5)
like image 30
Chris Pratt Avatar answered Nov 07 '22 00:11

Chris Pratt


have you looked at the range filter?

https://docs.djangoproject.com/en/dev/ref/models/querysets/#range

The example given in the doc's seems similar to your situation.

like image 1
Timmy O'Mahony Avatar answered Nov 07 '22 01:11

Timmy O'Mahony