Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django - how to filter using QuerySet to get subset of objects?

Tags:

python

django

According to documentation:

filter(**kwargs) Returns a new QuerySet containing objects that match the given lookup parameters.

The lookup parameters (**kwargs) should be in the format described in Field lookups below. Multiple parameters are joined via AND in the underlying SQL statement.

Which to me suggests it will return a subset of items that were in original set. However I seem to be missing something as below example does not behave as I would expect:

>>> kids = Kid.objects.all()
>>> tuple(k.name for k in kids)
(u'Bob',)
>>> toys = Toy.objects.all()
>>> tuple( (t.name, t.owner.name) for t in toys)
((u'car', u'Bob'), (u'bear', u'Bob'))
>>> subsel = Kid.objects.filter( owns__in = toys )
>>> tuple( k.name for k in subsel )
(u'Bob', u'Bob')
>>> str(subsel.query)
'SELECT "bug_kid"."id", "bug_kid"."name" FROM "bug_kid" INNER JOIN "bug_toy" ON ("bug_kid"."id" = "bug_toy"."owner_id") WHERE "bug_toy"."id" IN (SELECT U0."id" FROM "bug_toy" U0)'

As you can see in above subsel ends up returning duplicate records, which is not what I wanted. My question is what is the proper way to get subset? (note: set by definition will not have multiple occurrences of the same object)

Explanation as to why it behaves like that would be also nice, as to me filter means what you achieve with filter() built-in function in Python. Which is: take elements that fulfill requirement (or in other words discard ones that do not). And this definition doesn't seem to allow introduction/duplication of objects.

I know can aplly distinct() to the whole thing, but that still results in rather ugly (and probably slower than could be) query:

>>> str( subsel.distinct().query )
'SELECT DISTINCT "bug_kid"."id", "bug_kid"."name" FROM "bug_kid" INNER JOIN "bug_toy" ON ("bug_kid"."id" = "bug_toy"."owner_id") WHERE "bug_toy"."id" IN (SELECT U0."id" FROM "bug_toy" U0)'

My models.py for completeness:

from django.db import models

class Kid(models.Model):
    name = models.CharField(max_length=200)

class Toy(models.Model):
    name = models.CharField(max_length=200)
    owner = models.ForeignKey(Kid, related_name='owns')

edit:

After a chat with @limelight the conclusion is that my problem is that I expect filter() to behave according to dictionary definition. And i.e. how it works in Python or any other sane framework/language.

More precisely if I have set A = {x,y,z} and I invoke A.filter( <predicate> ) I don't expect any elements to get duplicated. With Django's QuerySet however it behaves like this:

A = {x,y,z}
A.filter( <predicate> )
# now A i.e. = {x,x}

So first of all the issue is inappropriate method name (something like match() would be much better). Second thing is that I think it is possible to create more efficient query than what Django allows me to. I might be wrong on that, if I will have a bit of time I will probably try to check if that is true.

like image 427
elmo Avatar asked Aug 26 '13 16:08

elmo


People also ask

Can I filter a QuerySet Django?

Working with Filter Easily the most important method when working with Django models and the underlying QuerySets is the filter() method, which allows you to generate a QuerySet of objects that match a particular set of filtered parameters.


2 Answers

This is kind of ugly, but works (without any type safety):

toy_owners = Toy.objects.values("owner_id")  # optionally with .distinct()
Kid.objects.filter(id__in=toy_owners)

If performance is not an issue, I think @limelights is right.

PS! I tested your query on Django 1.6b2 and got the same unnecessary complex query.

like image 87
chlunde Avatar answered Nov 05 '22 04:11

chlunde


Instead DISTINCT you can use GROUP BY (annotate in django) to get distinct kids.

toy_owners = Toy.objects.values_list("owner_id", flat=True).distinct()
Kid.objects.only('name').filter(pk__in=toy_owners).annotate(count=Count('owns'))
like image 24
P̲̳x͓L̳ Avatar answered Nov 05 '22 02:11

P̲̳x͓L̳