Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Django QuerySet with Q() expression returning duplicate values?

I have the following 2 Django models:

from mptt.models import MPTTModel, TreeForeignKey
from django.db import models
from django.db.models import Q

class Model1(MPTTModel):
    random_field = models.IntegerField()
    parent = TreeForeignKey('self', null=True, blank=True)


class Model2(models.Model):
    model_1 = models.ManyToManyField(Model1)

    @staticmethod
    def descendants_queryset(model1):
        q = Q()
        for curr_descendant in model1.get_descendants:
            q |= Q(model_1=curr_descendant)
        return q

I have created instances like this:

>>> a = Model2.objects.create()
>>> b = Model1.objects.create(random_field=1, parent=None)
>>> c = Model1.objects.create(random_field=2, parent=b)
>>> d = Model1.objects.create(random_field=3, parent=b)
>>> a.model_1.add(c)
>>> a.pk
3

When I do a normal queryset filter and when I use the Q() expression it produces the same results (as expected):

>>> [i.pk for i in Model2.objects.filter(pk=3)]
[3]
>>> [i.pk for i in Model2.objects.filter(Model2.descendants_queryset(b), pk=3)]
[3]

But when I add another instance of the Model1 to the ManyToMany relationship, I see a weird duplication only when I filter using the Q() expression:

>>> a.model_1.add(d)
>>> [i.pk for i in Model2.objects.filter(pk=3)]
[3]
>>> [i.pk for i in Model2.objects.filter(Model2.descendants_queryset(b), pk=3)]
[3, 3]

I'm confused why this duplication is happening. It seems like a bug to me. I can obviously work around it by adding a .distinct() to the queryset. But that seems like it should not be necessary. Why is this happening and what is the proper solution?

like image 827
Saqib Ali Avatar asked Dec 26 '15 04:12

Saqib Ali


1 Answers

I noticed when you add a third element to a, your output is not only duplicated but tripled:

>>> 4 = Model1.objects.create(random_field=3, parent=b)
>>> a.model_1.add(e)
>>> [i.pk for i in Model2.objects.filter(Model2.descendants_queryset(b), pk=3)]
[3, 3, 3]

And quadrupled if you add another and so on...

So what I'm guessing is, that since your Q()-query in descendants_queryset() is ORed, it returns every object, which has the b object as parent and the filter matches multiple times for a (which has multple references to Model1 objects).

If we look at the raw SQL for Model2.objects.filter(Model2.descendants_queryset(b)), we see the following:

>>> Model2.objects.filter(Model2.descendants_queryset(b)).query.sql_with_params()
(u'SELECT "Foo_model2"."id" FROM "Foo_model2" LEFT OUTER JOIN "Foo_model2_model_1" ON ("Foo_model2"."id" = "Foo_model2_model_1"."model2_id") WHERE ("Foo_model2_model_1"."model1_id" = %s OR "Foo_model2_model_1"."model1_id" = %s OR "Foo_model2_model_1"."model1_id" = %s)', (17, 18, 19))

Or more readable:

SELECT "Foo_model2"."id"
FROM "Foo_model2"
  LEFT OUTER JOIN "Foo_model2_model_1"
  ON ("Foo_model2"."id" = "Foo_model2_model_1"."model2_id")
WHERE ("Foo_model2_model_1"."model1_id" = 17
  OR "Foo_model2_model_1"."model1_id" = 18
  OR "Foo_model2_model_1"."model1_id" = 19)

So it does actually concatenate the queries which are generated by q |= Q(model_1=curr_descendant) with OR statements, which returns not one, but in this case three references (all to the same Model2 object, which holds ManyToMany-references to three Model1 objects). This is due to the join statement - See here for some examples.

If we add the extra filter for pk=3, it does not limit the output any further, since the PK for all returned objects is identical (3).

If you add another Model2 object, and add c as reference to the new elements model1 ManyToMany-reference, you get the following:

>>> a2 = Model2.objects.create()
>>> a2.model_1.add(c)
>>> [i.pk for i in Model2.objects.filter(Model2.descendants_queryset(b))]
[3, 3, 3, 4]

The id of the new Model2 object shows up in the queryset as well, since it also has one reference to a model1 object.

I don't have any smashing ideas for the best solution right now, but calling .distinct() on the query set seems pretty straight forward to me.

like image 158
andreas-hofmann Avatar answered Sep 18 '22 17:09

andreas-hofmann