Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django queryset with annotate, why is GROUP BY applied to all fields?

I'm using Django 1.6 with PostgreSQL and have following model:

# models.py
class Game(AbstractContentModel, AbstractScoreModel):
    name = models.CharField(_("name"), max_length=100, blank=True)
    developer = models.CharField(_('Developer'), max_length=255)
    distributor = models.CharField(_('Distributor'), max_length=255, blank=True)
    # ...
    reviews = models.ManyToManyField(Review, related_name="games", blank=True, verbose_name=_("Reviews"))
    videos = models.ManyToManyField(Video, related_name="games", blank=True, verbose_name=_("Videos"))
    images = models.ManyToManyField(Image, related_name="games", blank=True, verbose_name=_("Gallery"))

I'm trying to get all games and to each of them add count of related videos, reviews and images as follows:

# views.py
qs = Game.objects.all()
qs = qs.annotate(video_count=models.Count('videos'))
qs = qs.annotate(review_count=models.Count('reviews'))
qs = qs.annotate(image_count=models.Count('images'))

Result query is:

SELECT 
"content_game"."id", 
"content_game"."name",
"content_game"."developer", 
"content_game"."distributor",
COUNT("content_game_videos"."video_id") AS "video_count",
COUNT("content_game_reviews"."review_id") AS "review_count", 
COUNT("content_game_images"."image_id") AS "image_count" 
FROM "content_game" 
LEFT OUTER JOIN "content_game_videos" ON ( "content_game"."id" = "content_game_videos"."game_id" )
LEFT OUTER JOIN "content_game_reviews" ON ( "content_game"."id" = "content_game_reviews"."game_id" ) 
LEFT OUTER JOIN "content_game_images" ON ( "content_game"."id" = "content_game_images"."game_id" ) 
GROUP BY 
"content_game"."id", 
"content_game"."name",
"content_game"."developer", 
"content_game"."distributor";

My question is - why there are all selected fields in GROUP BY clause? And most importantly, how can I get rid of them besides raw SQL?

I gathered that it would be good to use .values(), but I want every field in result query. If I just use GROUP BY "content_game"."id", the result is the same, but I don't know how to use it like this with Django ORM.

like image 848
Flaiming Avatar asked Dec 15 '15 08:12

Flaiming


1 Answers

I'm too late to answer but I have kind of studied answer. First of all, the behavior what OP said, is correct. And it should be for all kind of relational database, but as @lad2025 said in the comment the only MySQL (MariaDB as well) was excusing to this behavior. So instead of getting rid of it we must learn it. As this is the only correct way to use it / understand it and get used to with it.

Currently OP has found solution that is workaround of keep updating the table fields with counts of rows somewhere and populate later instead of retrieving it data on the fly by query.

As whatever you add into GROUP BY must be included into SELECT and vice versa while using aggregative functions.

"content_game"."id", "content_game"."name", "content_game"."developer",  "content_game"."distributor"

Above fields must come into GROUP BY because if I try to print id or any field from the selection which row's field (single) value this should return? either last or first row from the group bunch which is incorrect (which MySQL does). So better keep all into GROUP BY.

Why? Because of ambiguity - group by produces a single record for each group of records. Group By columns contain the same value for all records in the group and any aggregative function produces the single value for a group, but "content_game"."developer" can be different for "content_game"."distributor" group "content_game"."id" will be different for all rows as it's primary key. Same for all fields.

Good explanation can be found here

And I would like people to share (edit my post) more statement in more nicer way. So other people can found this useful in the future.

like image 189
Anup Yadav Avatar answered Nov 09 '22 12:11

Anup Yadav