I am creating a sparse preferences table in Django. My model is simple:
class Preference(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, related_name='preferences')
preference = models.CharField(max_length=255, db_index=True)
value = models.BooleanField()
Some preferences have default states, so I need to be able to ask the database two questions: "Which users have this preference set to a certain value?" and "Which users do not have this preference set to that value (either because they do not have the preference set or because they have proactively set the preference to another value)?"
My problem is that the former question works, but the latter question (the same query clauses, but with an exclude()
instead of a filter()
) does not work. For instance:
My test database has 14 users, and a single user has two preferences set: 'PREF_A'
is set to True
and 'PREF_B'
is set to False
.
>>> User.objects.all().count()
14
>>> User.objects.filter(preferences__preference="PREF_A", preferences__value=True).count()
1
>>> User.objects.exclude(preferences__preference="PREF_A", preferences__value=True).count()
13
>>> User.objects.filter(preferences__preference="PREF_A", preferences__value=False).count()
0
>>> User.objects.exclude(preferences__preference="PREF_A", preferences__value=False).count()
13
So, my results say that:
There are 14 users in total
1 user has PREF_A set to True
13 users do not have PREF_A set to True
0 users have PREF_A set to False
13 users do not have PREF_A set to False <--- this is inaccurate
Where is this query going wrong, and how can I write the query to properly exclude people who have a specific preference set to a specific value?
I have tried using Q
and ~Q
to see if the behavior would be different, but the results were the same.
This is a still-present gotcha in Django, where exclude()
is not acting as the reverse of filter()
. Here's the documentation explaining the difference:
Note
The behavior of
filter()
for queries that span multi-value relationships, as described above, is not implemented equivalently forexclude()
. Instead, the conditions in a singleexclude()
call will not necessarily refer to the same item.For example, the following query would exclude blogs that contain both entries with “Lennon” in the headline and entries published in 2008:
Blog.objects.exclude( entry__headline__contains='Lennon', entry__pub_date__year=2008, )
However, unlike the behavior when using
filter()
, this will not limit blogs based on entries that satisfy both conditions. In order to do that, i.e. to select all blogs that do not contain entries published with “Lennon” that were published in 2008, you need to make two queries:Blog.objects.exclude( entry__in=Entry.objects.filter( headline__contains='Lennon', pub_date__year=2008, ), )
What you've done is probably the way to go.
I implemented a quick-and-dirty solution so I could move on, expecting it to be horribly inefficient; however, on inspection of the SQL generated, it turned out to not be that bad:
>>> User.objects.exclude(id__in=User.objects.filter(preferences__preference="PREF_A", preferences__value=True))
I thought that the ORM would load the results of the subordinate query into the webserver's memory before completing (this is a problem because our production app will have in the millions of users), but in fact it properly uses a subquery:
>>> User.objects.exclude(id__in=User.objects.filter(preferences__preference="PREF_A", preferences__value=True)).values('id').query.sql_with_params()
(u'SELECT "sgauth_user"."id" FROM "sgauth_user" WHERE NOT ("sgauth_user"."id" IN (SELECT U0."id" FROM "sgauth_user" U0 INNER JOIN "feeds_preference" U1 ON (U0."id" = U1."user_id") WHERE (U1."preference" = %s AND U1."value" = %s )))', ('PREF_A', True))
I'm putting this up as one possible answer, but I am still interested if there is a way to do it with the straightforward exclude clause, or a way of generating a query through the ORM that works with straightforward joins and no subquery whatsoever.
You can use the new django SubQuery to avoid doing 2 queries to the server:
User.objects.exclude(id__in=SubQuery(User.objects.filter(preferences__preference="PREF_A", preferences__value=True)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With