Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

QuerySet with .latest() for each day

Tags:

django

I have a basic model like:

class Stats(models.Model):

   created = models.DateTimeField(auto_now_add=True)
   growth = models.IntegerField()

I run a celery job every 10 minutes to create a new stats object.

Using .latest() on a QuerySet gives me the latest Stats object to date.

However, I want a list with one Stats object for each day.

Consider the following:

Stats(growth=100) #created 1/1/13 23:50
Stats(growth=200) #created 1/1/13 23:59
Stats(growth=111) #created 1/2/13 23:50
Stats(growth=222) #created 1/2/13 23:59

The QuerySet should return the latest for each day. In the example the one with 200 and 222 growth.

In SQL I'd fire up a subquery with the max for each day and join it together.

Since I don't want to use raw SQL, is there a way to do this with the django ORM?

like image 651
Jay Avatar asked Jun 20 '13 11:06

Jay


People also ask

How do I get the last element in QuerySet?

last() to retrieve last and Model. objects. all().

What is mean by QuerySet?

A QuerySet is a collection of data from a database. A QuerySet is built up as a list of objects. QuerySets makes it easier to get the data you actually need, by allowing you to filter and order the data.

What does First () do in Django?

first, which takes a query set and returns the first element, or None if the query set was empty. Note, that this is not identical. The get method will ensure that there is exactly one row in the database that matches the query.


1 Answers

Unfortunately there's no way (that I'm aware of.. I looked pretty hard) to avoid using some kind of raw sql to accomplish what you want to do (with your current model; see the very end for another suggestion). But you can minimise the impact by writing as little raw sql as possible. In practise django sites do not need to be portable across different databases. Unless you plan on using this app elsewhere or releasing it publicly, you should be ok.

The below example is for sqlite. You could keep a mapping of database types to date functions, look up the type of driver, and replace the function with the correct one if you needed to.

>>> for stat in Stats.objects.all():
...     print stat.created, stat.growth
...
2013-06-22 13:41:25.334262+00:00 3
2013-06-22 13:41:40.473373+00:00 3
2013-06-22 13:41:44.921247+00:00 4
2013-06-22 13:41:47.533102+00:00 5
2013-06-23 13:41:58.458250+00:00 6
2013-06-23 13:42:01.282702+00:00 3
2013-06-23 13:42:03.633236+00:00 1

>>> last_stat_per_day = Stats.objects.extra( 
            select={'the_date': 'date(created)' }
        ).values_list('the_date').annotate(max_date=Max('created'))

>>> last_stat_per_day
[(u'2013-06-22', datetime.datetime(2013, 6, 22, 13, 41, 47, 533102, tzinfo=<UTC>)), (u'2013-06-23', datetime.datetime(2013, 6, 23, 13, 42, 3, 633236, tzinfo=<UTC>))]

>>> max_dates = [item[1] for item in last_stat_per_day]
>>> max_dates
[datetime.datetime(2013, 6, 22, 13, 41, 47, 533102, tzinfo=<UTC>), 
 datetime.datetime(2013, 6, 23, 13, 42, 3, 633236, tzinfo=<UTC>)]

>>> stats = Stats.objects.filter(created__in=max_dates)
>>> for stat in stats:
...     print stat.created, stat.growth
...
2013-06-22 13:41:47.533102+00:00 5
2013-06-23 13:42:03.633236+00:00 1

I had written here before that this was only a single query but I lied - the values_list needs to be transformed to only return the max_date for the successive query, which means running the statement. It's only 2 queries though, which would be significantly better than an N+1 function.

The non-portable bit is this:

last_stat_per_day = Stats.objects.extra( 
    select={'the_date': 'date(created)' }
).values_list('the_date').annotate(max_date=Max('created'))

Using extra isn't ideal, but the raw sql here is simple, and lends itself nicely to a database driver dependent replacement. Only the date(created) needs to be replaced. You can wrap this up in a method on a custom manager if you like, and you've then successfully abstracted this mess away in a single location.

The other option is to just add a DateField to your model, and then you don't need to use extra at all. You'd simply replace the values_list call with a values_list('created_date'), remove the extra completely, and call it a day. The cost is obvious - more storage space required. It's also non-intuitive as to why you have a Date and a DateTime field on the same model. Keeping the two in sync may also pose problems.

like image 145
Josh Smeaton Avatar answered Nov 16 '22 01:11

Josh Smeaton