Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I control the GROUP BY in django 1.3's orm?

I think this will best be explained with an example.

Here is what the data would looks like:

|project            |
|id|name            |
|1 |some project    |
|2 |my other project|

|run                                  |
|id|project_id|start_time   |result   |
|1 |1         |1305732581845|something|
|2 |1         |1305732593721|nothing  |
|3 |2         |1305732343721|nothing  |
|4 |2         |1305732556821|something|

I would like to be able to get an entire recordset from each of the latest runs by project. The SQL Query would look something like this:

SELECT *, MAX("run"."start_time")
FROM "run"    
LEFT OUTER JOIN "project" ON ("run"."project_id" = "project"."id") 
GROUP BY "project"."id"

This will return me all columns across both tables for the latest run of the project, which is great, it is exactly what I need.

So in attempting to find the django orm equivalent in django 1.3 I simply can't find a proper way to do it. If I do something like this:

Run.objects.annotate(Max('start_time'))

The generated SQL query will look something like:

SELECT 
"run"."id", "run"."result", "run"."project_id", "project"."id", "project"."name", 
MAX("run"."start_time")
FROM "run"
LEFT OUTER JOIN "project" ON ("run"."project_id" = "project"."id")
GROUP BY "run"."id", "run"."result", "run"."project_id", "project"."id", "project"."name"

This will not return me the proper results as the group by is incorrect for what I want. I believe in previous versions of django the following would properly and explicitly set the group by clause in the query but appears to not work in 1.3:

q = Run.objects.annotate(Max('start_time'))
q.query.group_by = [("project", "id")]

In 1.3 this generates the exact same query as not manually modifying the group_by property in the query.

I also tried this the logical way based on the documented behavior of .values() before and after the annotate() call but it did not work as expected. When I tried this:

q = Run.objects.values('project__id').annotate(Max('start_time')).values('id')

I ended up with a query like this:

SELECT 
"run"."id", "run"."project_id"
MAX("run"."start_time")
FROM "run"
LEFT OUTER JOIN "project" ON ("run"."project_id" = "project"."id")
GROUP BY "run"."id", "run"."project_id"

Can anyone point me to the correct way to do what I am doing without any of the following:

  • Using raw sql - what would be the point of using an orm when I constantly have to generate my own queries?
  • Using .extra(select = {'latest': 'somequery'}) - why should I have to use subqueries when a perfectly valid query without subqueries can give me what I want.
  • Using multiple queries to grab the same data - again, why should I have to make multiple queries to get results that are available in 1?
like image 401
mockobject Avatar asked Nov 14 '22 19:11

mockobject


1 Answers

tl;dr: Django does allow you to control the group by clause but it limits it to work across all flavors of SQL so I can't do what I want.

It has been pointed out to me that the original query I am trying to generate with the django ORM is not actually valid for all flavors of SQL. Here is a refresher of the query I was looking for:

SELECT *, MAX("run"."start_time")
FROM "run"    
LEFT OUTER JOIN "project" ON ("run"."project_id" = "project"."id") 
GROUP BY "project"."id"

If a person attempts to select something that is not in the GROUP BY in MSSQL they will actually get an error. So it seems to me that django actually shouldn't let me generate a query like this and I am essentially attempting to solve my problem incorrectly.

like image 131
mockobject Avatar answered Dec 18 '22 11:12

mockobject