Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django filter() on field of related model

Tags:

I think I'm missing something very basic and fundamental about how Django's filter() method is supposed to work.

Using the following models:

class Collection(models.Model): 
    pass

class Item(models.Model):
    flag = models.BooleanField()
    collection =  models.ForeignKey(Collection)

and with data supplied by calling the populate() function at the bottom of the question, try executing the following in ./manage.py shell:

len(Collection.objects.filter(item__flag=True))

My expectation was that this would print "2", which is the number of Collections that have at least one Item with flag=True. This expectation was based on the documentation at https://docs.djangoproject.com/en/1.5/topics/db/queries/#lookups-that-span-relationships, which has an example saying "This example retrieves all Entry objects with a Blog whose name is 'Beatles Blog'".

However, the call above actually prints "6", which is the number of Item records that have flag=True. The actual objects returned are Collection objects, though. It seems that it's returning the same Collection object multiple times, once for each corresponding Item record with flag=True. This can be confirmed by:

queryset = Collection.objects.filter(item__flag=True)
queryset[0] == queryset[1]

which prints True.

Is this the correct behaviour? If so what is the rationale? If it is what is expected, the documentation could be construed as strictly correct, but it omits to say that each object can be returned multiple times.

Here is a related example, which seems to be very surprising (or just plain wrong) behaviour. It caught me out in a case where an exclude() call was being added by a custom model manager and the caller was then adding a filter():

from django.db.models import Count    
[coll.count for coll in Collection.objects.filter(item__flag=True).annotate(count=Count("item"))]
[coll.count for coll in Collection.objects.exclude(item=None).filter(item__flag=True).annotate(count=Count("item"))]

The first case prints "[2,4]", but the second prints "[8,16]"!!!

Populate function:

def populate():
    Collection.objects.all().delete()

    collection = Collection()
    collection.save()
    item = Item(collection=collection, flag=True)
    item.save()
    item = Item(collection=collection, flag=True)
    item.save()
    item = Item(collection=collection, flag=False)
    item.save()
    item = Item(collection=collection, flag=False)
    item.save()

    collection = Collection()
    collection.save()
    item = Item(collection=collection, flag=True)
    item.save()
    item = Item(collection=collection, flag=True)
    item.save()
    item = Item(collection=collection, flag=True)
    item.save()
    item = Item(collection=collection, flag=True)
    item.save()

    collection = Collection()
    collection.save()
    item = Item(collection=collection, flag=False)
    item.save()
    item = Item(collection=collection, flag=False)
    item.save()
    item = Item(collection=collection, flag=False)
    item.save()
    item = Item(collection=collection, flag=False)
    item.save()
like image 931
Tom Avatar asked Jun 16 '13 10:06

Tom


People also ask

What is the purpose of filter () method in Django?

The filter() method is used to filter you search, and allows you to return only the rows that matches the search term.

How do you use get and filter together in Django?

This exception is also an attribute of the model class. Returns a new QuerySet containing objects that match the given lookup parameters. Basically use get() when you want to get a single unique object, and filter() when you want to get all objects that match your lookup parameters. Show activity on this post.

Can you filter by property Django?

Nope. Django filters operate at the database level, generating SQL. To filter based on Python properties, you have to load the object into Python to evaluate the property--and at that point, you've already done all the work to load it.

What does .values do in Django?

The values_list() method allows you to return only the columns that you specify.


1 Answers

It turns out there are two parts to this. First is the distinct() method, for which the doc says:

By default, a QuerySet will not eliminate duplicate rows. In practice, this is rarely a problem, because simple queries such as Blog.objects.all() don’t introduce the possibility of duplicate result rows. However, if your query spans multiple tables, it’s possible to get duplicate results when a QuerySet is evaluated. That’s when you’d use distinct().

The following outputs "2" as expected:

len(Collection.objects.filter(item__flag=True).distinct())

However, this doesn't help with the more complex example I gave, using annotate(). It turns out this is an instance of a known issue: https://code.djangoproject.com/ticket/10060.

like image 113
Tom Avatar answered Sep 22 '22 04:09

Tom