Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django queryset exclude() with multiple related field clauses

I am creating a sparse preferences table in Django. My model is simple:

class Preference(models.Model):
    user = models.ForeignKey(settings.AUTH_USER_MODEL, related_name='preferences')
    preference = models.CharField(max_length=255, db_index=True)
    value = models.BooleanField()

Some preferences have default states, so I need to be able to ask the database two questions: "Which users have this preference set to a certain value?" and "Which users do not have this preference set to that value (either because they do not have the preference set or because they have proactively set the preference to another value)?"

My problem is that the former question works, but the latter question (the same query clauses, but with an exclude() instead of a filter()) does not work. For instance:

My test database has 14 users, and a single user has two preferences set: 'PREF_A' is set to True and 'PREF_B' is set to False.

>>> User.objects.all().count()
14
>>> User.objects.filter(preferences__preference="PREF_A", preferences__value=True).count()
1
>>> User.objects.exclude(preferences__preference="PREF_A", preferences__value=True).count()
13
>>> User.objects.filter(preferences__preference="PREF_A", preferences__value=False).count()
0
>>> User.objects.exclude(preferences__preference="PREF_A", preferences__value=False).count()
13

So, my results say that:

There are 14 users in total

  • 1 user has PREF_A set to True

  • 13 users do not have PREF_A set to True

  • 0 users have PREF_A set to False

  • 13 users do not have PREF_A set to False <--- this is inaccurate

Where is this query going wrong, and how can I write the query to properly exclude people who have a specific preference set to a specific value?

I have tried using Q and ~Q to see if the behavior would be different, but the results were the same.

like image 285
Andrew Gorcester Avatar asked May 23 '13 02:05

Andrew Gorcester


3 Answers

This is a still-present gotcha in Django, where exclude() is not acting as the reverse of filter(). Here's the documentation explaining the difference:

Note

The behavior of filter() for queries that span multi-value relationships, as described above, is not implemented equivalently for exclude(). Instead, the conditions in a single exclude() call will not necessarily refer to the same item.

For example, the following query would exclude blogs that contain both entries with “Lennon” in the headline and entries published in 2008:

Blog.objects.exclude(
    entry__headline__contains='Lennon',
    entry__pub_date__year=2008,
)

However, unlike the behavior when using filter(), this will not limit blogs based on entries that satisfy both conditions. In order to do that, i.e. to select all blogs that do not contain entries published with “Lennon” that were published in 2008, you need to make two queries:

Blog.objects.exclude(
    entry__in=Entry.objects.filter(
        headline__contains='Lennon',
        pub_date__year=2008,
    ),
)

What you've done is probably the way to go.

like image 133
M Somerville Avatar answered Oct 16 '22 05:10

M Somerville


I implemented a quick-and-dirty solution so I could move on, expecting it to be horribly inefficient; however, on inspection of the SQL generated, it turned out to not be that bad:

>>> User.objects.exclude(id__in=User.objects.filter(preferences__preference="PREF_A", preferences__value=True))

I thought that the ORM would load the results of the subordinate query into the webserver's memory before completing (this is a problem because our production app will have in the millions of users), but in fact it properly uses a subquery:

>>> User.objects.exclude(id__in=User.objects.filter(preferences__preference="PREF_A", preferences__value=True)).values('id').query.sql_with_params()
(u'SELECT "sgauth_user"."id" FROM "sgauth_user" WHERE NOT ("sgauth_user"."id" IN (SELECT U0."id" FROM "sgauth_user" U0 INNER JOIN "feeds_preference" U1 ON (U0."id" = U1."user_id") WHERE (U1."preference" = %s  AND U1."value" = %s )))', ('PREF_A', True))

I'm putting this up as one possible answer, but I am still interested if there is a way to do it with the straightforward exclude clause, or a way of generating a query through the ORM that works with straightforward joins and no subquery whatsoever.

like image 32
Andrew Gorcester Avatar answered Oct 16 '22 04:10

Andrew Gorcester


You can use the new django SubQuery to avoid doing 2 queries to the server:

User.objects.exclude(id__in=SubQuery(User.objects.filter(preferences__preference="PREF_A", preferences__value=True)))
like image 32
Patrik Beck Avatar answered Oct 16 '22 03:10

Patrik Beck