Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Annotating with Count not working how I expect it to

Tags:

django

I'm experimenting with Django by building a simple RPG. It has armor upgrades. There are different categories of armor (e.g. head and body). Each category has many pieces of armor. For example, the "Head" category might have "Dragon Helm", "Duck Helm", "Needle Helm", and others.

In order for the user to see any of the armor available in the a category, they must first be granted access to at least one of the pieces of armor in that category. At that point, they can view all of the armor in that category—including the armor they can't yet purchase.

The Problem

I'm trying to efficiently query the database for all of a category's armor while simultaneously taking note of which pieces of armor the user has been granted access to. I have it kind of working, but not totally.

Relevant code

models.py

from django.contrib.auth.models import User
from django.db import models


class Armor(models.Model):
    armor_category = models.ForeignKey('ArmorCategory')
    name = models.CharField(max_length=100)
    profile = models.ManyToManyField('Profile', through='ProfileArmor')

    def __unicode__(self):
        return self.name


class ArmorCategory(models.Model):
    name = models.CharField(max_length=100)
    slug = models.SlugField(blank=True)

    class Meta:
        verbose_name_plural = 'Armor categories'

    def __unicode__(self):
        return self.name


class Profile(models.Model):
    user = models.OneToOneField(User)
    dob = models.DateField('Date of Birth')

    def __unicode__(self):
        return self.user.get_full_name()


class ProfileArmor(models.Model):
    profile = models.ForeignKey(Profile)
    armor = models.ForeignKey(Armor)
    date_created = models.DateTimeField(auto_now_add=True)

    class Meta:
        ordering = ('-date_created',)

    def __unicode__(self):
        return '%s: %s' % (self.profile.user.get_full_name(), self.armor.name)

urls.py

from django.conf.urls import patterns, url
from core import views

urlpatterns = patterns('',
    url(r'^upgrades/(?P<armor_category_slug>[-\w]+)/$', views.Upgrades.as_view(), name='upgrades'),
)

views.py (this file is where the problem is)

from django.db.models import Count, Q
from django.http import HttpResponseRedirect
from django.shortcuts import get_object_or_404
from django.views.generic.base import TemplateView
from .models import ArmorCategory


class Upgrades(TemplateView):
    template_name = 'core/upgrades.html'

    def get(self, request, *args, **kwargs):
        # Make sure the slug is valid.

        self.armor_category = get_object_or_404(ArmorCategory, slug=kwargs['armor_category_slug'])

        # Make sure the user has been granted access to at least one item in
        # this category, otherwise there is no point for the user to even be
        # here.

        if self.armor_category.armor_set.filter(
            profilearmor__profile=self.request.user.profile
        ).count() == 0:
            return HttpResponseRedirect('/')

        return super(Upgrades, self).get(request, *args, **kwargs)

    def get_context_data(self, **kwargs):
        # THIS IS WHERE THE PROBLEM IS.

        # Get all of the armor in this category, but also take note of which
        # armor this user has been granted access to.

        armor = self.armor_category.armor_set.filter(
            Q(profilearmor__profile=self.request.user.profile) |
            Q(profilearmor__profile=None)
        ).annotate(profile_armor_count=Count('profilearmor__id'))

        print armor.query

        for armor_item in armor:
            print '%s: %s' % (armor_item.name, armor_item.profile_armor_count)

        return {
            'armor_category': self.armor_category,
            'armor': armor,
        }

More Details of the Problem

I created the "Head" category and gave it the three pieces of armor indicated in the first paragraph of this question. I created the armor in the same order I have them listed above. I then created two user profiles.

I gave the first user profile access to the "Duck Helm" armor. I then accessed /upgrades/head/ with the first user profile and got this output from the for loop:

Dragon Helm: 0
Duck Helm: 1
Needle Helm: 0

This was the expected output. Next, I gave the second user profile access to the "Dragon Helm" armor. When I accessed the same URL with the second user profile, I got this output:

Dragon Helm: 1
Needle Helm: 0

Why isn't the "Duck Helm" armor listed? I decided to go back to the same URL with the first user profile again to make sure it was still working. When I did that, I got this output:

Duck Helm: 1
Needle Helm: 0

Now the "Dragon Helm" armor was gone.

Any ideas?

like image 310
Nick Avatar asked Mar 29 '13 07:03

Nick


1 Answers

1. Quick explanation

When you test a many-to-many relation against None in a Django queryset, as you do here:

Q(profilearmor__profile=None)

that matches rows where there are no corresponding rows in the many-to-many relation. So your query

self.armor_category.armor_set.filter(
    Q(profilearmor__profile=self.request.user.profile) |
    Q(profilearmor__profile=None))

matches items of armor which either the self.request.user has access to, or which no-one has access to. That's why your query for the second user failed to return a row for the Duck Helm: because someone (namely the first user) had access to it.

2. What to do instead

The query that you want to run looks like this in SQL:

SELECT `myapp_armor`.`id`, `myapp_armor`.`armor_category_id`, `myapp_armor`.`name`,
       COUNT(`myapp_profilearmor`.`id`) AS `profile_armor_count`
FROM `myapp_armor`
LEFT OUTER JOIN `myapp_profilearmor`
             ON `myapp_armor`.`id` = `myapp_profilearmor`.`armor_id`
                AND `myapp_profilearmor`.`profile_id` = %s
WHERE `myapp_armor`.`armor_category_id` = %s
GROUP BY `myapp_armor`.`id`, `myapp_armor`.`armor_category_id`, `myapp_armor`.`name`

Unfortunately, Django's object-relational mapping system doesn't seem to provide a way to express this query. But you can always bypass the ORM and issue a raw SQL query. Like this:

sql = '''
    SELECT `myapp_armor`.`id`, `myapp_armor`.`armor_category_id`, `myapp_armor`.`name`,
           COUNT(`myapp_profilearmor`.`id`) AS `profile_armor_count`
    FROM `myapp_armor`
    LEFT OUTER JOIN `myapp_profilearmor`
                 ON `myapp_armor`.`id` = `myapp_profilearmor`.`armor_id`
                    AND `myapp_profilearmor`.`profile_id` = %s
    WHERE `myapp_armor`.`armor_category_id` = %s
    GROUP BY `myapp_armor`.`id`, `myapp_armor`.`armor_category_id`, `myapp_armor`.`name`
'''
armor = Armor.objects.raw(sql, [self.request.user.profile.id, self.armor_category.id])
for armor_item in armor:
    print('{:14}{}'.format(armor_item.name, armor_item.profile_armor_count))

For example:

>>> helmets = ArmorCategory.objects.get(id=1)
>>> profile = Profile.objects.get(id=1)
>>> armor = Armor.objects.raw(sql, [profile.id, helmets.id])
>>> for armor_item in armor:
...     print('{:14}{}'.format(armor_item.name, armor_item.profile_armor_count))
... 
Dragon Helm   0
Duck Helm     1
Needle Helm   0

3. How you could have figured this out for yourself

Here's the offending Django query:

armor = self.armor_category.armor_set.filter(
    Q(profilearmor__profile=self.request.user.profile) |
    Q(profilearmor__profile=None)
).annotate(profile_armor_count=Count('profilearmor__id'))

When you don't understand why a query is producing the wrong results, it's always worth looking at the actual SQL, which you can do by taking the queryset's query attribute and converting it to a string:

>>> from django.db.models import Q
>>> helmets = ArmorCategory.objects.get(name='Helmets')
>>> profile = Profile.objects.get(id=1)
>>> print(helmets.armor_set.filter(Q(profilearmor__profile=profile) |
...                                Q(profilearmor__profile=None)
... ).annotate(profile_armor_count=Count('profilearmor__id')).query)
SELECT `myapp_armor`.`id`, `myapp_armor`.`armor_category_id`, `myapp_armor`.`name`,
       COUNT(`myapp_profilearmor`.`id`) AS `profile_armor_count`
FROM `myapp_armor`
LEFT OUTER JOIN `myapp_profilearmor`
             ON (`myapp_armor`.`id` = `myapp_profilearmor`.`armor_id`) 
LEFT OUTER JOIN `myapp_profile`
             ON (`myapp_profilearmor`.`profile_id` = `myapp_profile`.`id`) 
WHERE (`myapp_armor`.`armor_category_id` = 1
       AND (`myapp_profilearmor`.`profile_id` = 1
            OR `myapp_profile`.`id` IS NULL))
GROUP BY `myapp_armor`.`id`, `myapp_armor`.`armor_category_id`, `myapp_armor`.`name`
ORDER BY NULL

What does this mean? If you know all about SQL joins, you can skip to section 5. Otherwise, read on.

4. Introduction to SQL joins

As I'm sure you know, a join of tables in SQL consists of combinations of rows from those tables (subject to some conditions that you specify in your query). For example, if you have two categories of armor and six items of armor:

mysql> SELECT * FROM myapp_armorcategory;
+----+---------+---------+
| id | name    | slug    |
+----+---------+---------+
|  1 | Helmets | helmets |
|  2 | Suits   | suits   |
+----+---------+---------+
2 rows in set (0.00 sec)

mysql> SELECT * FROM myapp_armor;
+----+-------------------+-------------+
| id | armor_category_id | name        |
+----+-------------------+-------------+
|  1 |                 1 | Dragon Helm |
|  2 |                 1 | Duck Helm   |
|  3 |                 1 | Needle Helm |
|  4 |                 2 | Spiky Suit  |
|  5 |                 2 | Flower Suit |
|  6 |                 2 | Battle Suit |
+----+-------------------+-------------+
6 rows in set (0.00 sec)

then a JOIN of those two tables will contain all 12 combinations of the 2 rows from the first table and the 6 rows of the second table:

mysql> SELECT * FROM myapp_armorcategory JOIN myapp_armor;
+----+---------+---------+----+-------------------+-------------+
| id | name    | slug    | id | armor_category_id | name        |
+----+---------+---------+----+-------------------+-------------+
|  1 | Helmets | helmets |  1 |                 1 | Dragon Helm |
|  2 | Suits   | suits   |  1 |                 1 | Dragon Helm |
|  1 | Helmets | helmets |  2 |                 1 | Duck Helm   |
|  2 | Suits   | suits   |  2 |                 1 | Duck Helm   |
|  1 | Helmets | helmets |  3 |                 1 | Needle Helm |
|  2 | Suits   | suits   |  3 |                 1 | Needle Helm |
|  1 | Helmets | helmets |  4 |                 2 | Spiky Suit  |
|  2 | Suits   | suits   |  4 |                 2 | Spiky Suit  |
|  1 | Helmets | helmets |  5 |                 2 | Flower Suit |
|  2 | Suits   | suits   |  5 |                 2 | Flower Suit |
|  1 | Helmets | helmets |  6 |                 2 | Battle Suit |
|  2 | Suits   | suits   |  6 |                 2 | Battle Suit |
+----+---------+---------+----+-------------------+-------------+
12 rows in set (0.00 sec)

Normally one adds conditions to the join to restrict the rows returned so that they make sense. For example, when joining the armor category table with the armor table we're only interested in combinations where the armor belongs to the armor category:

mysql> SELECT * FROM myapp_armorcategory JOIN myapp_armor
           ON myapp_armorcategory.id = myapp_armor.armor_category_id;
+----+---------+---------+----+-------------------+-------------+
| id | name    | slug    | id | armor_category_id | name        |
+----+---------+---------+----+-------------------+-------------+
|  1 | Helmets | helmets |  1 |                 1 | Dragon Helm |
|  1 | Helmets | helmets |  2 |                 1 | Duck Helm   |
|  1 | Helmets | helmets |  3 |                 1 | Needle Helm |
|  2 | Suits   | suits   |  4 |                 2 | Spiky Suit  |
|  2 | Suits   | suits   |  5 |                 2 | Flower Suit |
|  2 | Suits   | suits   |  6 |                 2 | Battle Suit |
+----+---------+---------+----+-------------------+-------------+
6 rows in set (0.08 sec)

This is all straightforward. But a problem arises where there are records in one table that have no matching item in the other. Let's add a new armor category with no corresponding items of armor:

mysql> INSERT INTO myapp_armorcategory (name, slug) VALUES ('Arm Guards', 'armguards');
Query OK, 1 row affected (0.00 sec)

If we re-run the query above (SELECT * FROM myapp_armorcategory JOIN myapp_armor ON myapp_armorcategory.id = myapp_armor.armor_category_id;) we'll get the same result: the new armor category has no matching records in the armor table so it doesn't appear in the JOIN. If we want to see all armor categories appear in the results, regardless of whether they have a matching row in the other table, we have to run a so-called outer join: in particular a LEFT OUTER JOIN:

mysql> SELECT * FROM myapp_armorcategory LEFT OUTER JOIN myapp_armor
           ON myapp_armorcategory.id = myapp_armor.armor_category_id;

+----+------------+-----------+------+-------------------+-------------+
| id | name       | slug      | id   | armor_category_id | name        |
+----+------------+-----------+------+-------------------+-------------+
|  1 | Helmets    | helmets   |    1 |                 1 | Dragon Helm |
|  1 | Helmets    | helmets   |    2 |                 1 | Duck Helm   |
|  1 | Helmets    | helmets   |    3 |                 1 | Needle Helm |
|  2 | Suits      | suits     |    4 |                 2 | Spiky Suit  |
|  2 | Suits      | suits     |    5 |                 2 | Flower Suit |
|  2 | Suits      | suits     |    6 |                 2 | Battle Suit |
|  3 | Arm Guards | armguards | NULL |              NULL | NULL        |
+----+------------+-----------+------+-------------------+-------------+
7 rows in set (0.00 sec)

For each row in the left-hand table that has no matching rows in the right-hand table, the left outer join contains a row with NULL for each column on the right.

5. Explaining why the query goes wrong

Let's create a single profile who has access to the Duck Helm:

mysql> INSERT INTO myapp_profile (name) VALUES ('user1');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO myapp_profilearmor (profile_id, armor_id) VALUES (1, 2);
Query OK, 1 row affected (0.00 sec)

Then let's run a simplified version of your query to try to understand what it is doing:

mysql> SELECT `myapp_armor`.*, `myapp_profilearmor`.*, T5.*
        FROM `myapp_armor`
        LEFT OUTER JOIN `myapp_profilearmor`
                    ON (`myapp_armor`.`id` = `myapp_profilearmor`.`armor_id`)
        LEFT OUTER JOIN `myapp_profile` T5
                    ON (`myapp_profilearmor`.`profile_id` = T5.`id`)
        WHERE `myapp_armor`.`armor_category_id` = 1;

+----+-------------------+-------------+------+------------+----------+------+-------+
| id | armor_category_id | name        | id   | profile_id | armor_id | id   | name  |
+----+-------------------+-------------+------+------------+----------+------+-------+
|  1 |                 1 | Dragon Helm | NULL |       NULL |     NULL | NULL | NULL  |
|  2 |                 1 | Duck Helm   |    1 |          1 |        1 |    1 | user1 |
|  3 |                 1 | Needle Helm | NULL |       NULL |     NULL | NULL | NULL  |
+----+-------------------+-------------+------+------------+----------+------+-------+
3 rows in set (0.04 sec)

So when you add the condition (myapp_profilearmor.profile_id = 1 OR T5.id IS NULL) then you'll get all the rows from this join.

Now let's create a second profile with access to the Dragon Helm:

mysql> INSERT INTO myapp_profile (name) VALUES ('user2');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO myapp_profilearmor (profile_id, armor_id) VALUES (2, 1);
Query OK, 1 row affected, 1 warning (0.09 sec)

And re-run the join:

mysql> SELECT `myapp_armor`.*, `myapp_profilearmor`.*, T5.*
        FROM `myapp_armor`
        LEFT OUTER JOIN `myapp_profilearmor`
                    ON (`myapp_armor`.`id` = `myapp_profilearmor`.`armor_id`)
        LEFT OUTER JOIN `myapp_profile` T5
                    ON (`myapp_profilearmor`.`profile_id` = T5.`id`)
        WHERE `myapp_armor`.`armor_category_id` = 1;

+----+-------------------+-------------+------+------------+----------+------+-------+
| id | armor_category_id | name        | id   | profile_id | armor_id | id   | name  |
+----+-------------------+-------------+------+------------+----------+------+-------+
|  1 |                 1 | Dragon Helm |    2 |          2 |        1 |    2 | user2 |
|  2 |                 1 | Duck Helm   |    1 |          1 |        2 |    1 | user1 |
|  3 |                 1 | Needle Helm | NULL |       NULL |     NULL | NULL | NULL  |
+----+-------------------+-------------+------+------------+----------+------+-------+
3 rows in set (0.00 sec)

Now you can see that when you run the query for the second user profile, the extra condition on the join will be (myapp_profilearmor.profile_id = 2 OR T5.id IS NULL) and this will select only rows 1 and 3. Row 2 will be missing.

So you can see that your original filter Q(profilearmor__profile=None) became the subclause T5.id IS NULL and this only selects rows where there is no entry (for any profile) in the ProfileArmor relation.

6. Other comments on your code

  1. You test to see if the profile has access to any type of armor in a category using the count() method on the queryset:

    if self.armor_category.armor_set.filter(
        profilearmor__profile=self.request.user.profile
    ).count() == 0:
    

    but since you don't actually care about the number of types of armor in the category which the profile has access to, just whether there's any, you should use the exists() method instead.

  2. Having gone to the trouble to set up a ManyToManyField on the Armor model, why not use it? That is, instead of querying the Armor model like this:

    Q(profilearmor__profile = ...)
    

    you can run the same query like this and save some typing:

    Q(profile = ...)
    

7. Postscript

This was an excellent question. All too often, Django questions don't provide enough detail of the models, for anyone to be able to answer with confidence.

like image 146
Gareth Rees Avatar answered Nov 15 '22 12:11

Gareth Rees