Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Usage of a COUNT(DISTINCT field) with a GROUP BY clause in Django

Problem

I want to use a COUNT(DISTINCT field) with a GROUP BY clause in Django. As I understand, the COUNT(DISTINCT... can only be achieved by using an extra for the query set.

My simplified model is :

class Site(models.Model):
    name = models.CharField(max_length=128, unique=True)

class Application(models.Model):
    name = models.CharField(max_length=64)
    version = models.CharField(max_length=13, db_index=True)

class User(models.Model):
    name = models.CharField(max_length=64) 
    site = models.ForeignKey(Site, db_index=True) 

class Device(models.Model):
    imei = models.CharField(max_length=16, unique=True)

    applications = models.ManyToManyField(Application, null=True, db_index=True, through='ApplicationUsage') 
    user = models.ForeignKey(User, null=True, db_index=True)

class ApplicationUsage(models.Model):
    activity = models.DateField(db_index=True)

    application = models.ForeignKey(Application)
    device = models.ForeignKey(Device)

My goal is to have a liste of Site objects with a count of distinct device for each site given an application activity through a time period, something like

stats_site.name     deviceCount
ALBI                32
AMPLEPUIS           42
...

I try this code :

qs = models.Site.objects.filter(user__device__applicationusage__activity__range=[startDay, endDay])\
                            .extra(select={'deviceCount' : 'COUNT(DISTINCT `stats_device`.`id`)'})\
                            .values('name', 'deviceCount')\

The generated SQL is :

SELECT (COUNT(DISTINCT stats_device.id)) AS deviceCount, stats_site.name
FROM stats_site
INNER JOIN stats_user ON (stats_site.id = stats_user.site_id)
INNER JOIN stats_device ON (stats_user.id = stats_device.user_id)
INNER JOIN stats_applicationusage ON (stats_device.id = stats_applicationusage.device_id)
WHERE stats_applicationusage.activity BETWEEN '2013-07-01' AND '2013-07-03'

And the result is obviously wrong since it lacks the GROUP BY clause, which should be GROUP BY stats_site.name

The problem is: I don't know how to add the correct GROUP BY using the annotate function or other.

Solution

Using distinct=True on the Count function with annotate:

qs = models.Site.objects.filter(habileouser__device__applicationusage__activity__range=[startDay, endDay])\
                            .annotate(deviceCount=Count('habileouser__device', distinct=True))\
                            .values('name', 'deviceCount')
like image 875
Emmanuel Sys Avatar asked Jan 12 '23 13:01

Emmanuel Sys


1 Answers

The annotate method of a queryset will calculate an aggregate value for each element of the queryset, and when used after a values call will aggregate over the values of the values. I think this should work:

qs = models.Site.objects.filter(user__device__applicationusage__activity__range=[startDay, endDay]).values('name').annotate(Count('user__device', distinct=True))

If you have an ordering specified you may need to remove it as discussed here: https://docs.djangoproject.com/en/dev/topics/db/aggregation/#interaction-with-default-ordering-or-order-by

like image 193
Peter DeGlopper Avatar answered Jan 22 '23 07:01

Peter DeGlopper