Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use inverse of a GenericRelation

I must be really misunderstanding something with the GenericRelation field from Django's content types framework.

To create a minimal self contained example, I will use the polls example app from the tutorial. Add a generic foreign key field into the Choice model, and make a new Thing model:

class Choice(models.Model):
    ...
    content_type = models.ForeignKey(ContentType)
    object_id = models.PositiveIntegerField()
    thing = GenericForeignKey('content_type', 'object_id')

class Thing(models.Model):
    choices = GenericRelation(Choice, related_query_name='things')

With a clean db, synced up tables, and create a few instances:

>>> poll = Poll.objects.create(question='the question', pk=123)
>>> thing = Thing.objects.create(pk=456)
>>> choice = Choice.objects.create(choice_text='the choice', pk=789, poll=poll, thing=thing)
>>> choice.thing.pk
456
>>> thing.choices.get().pk
789

So far so good - the relation works in both directions from an instance. But from a queryset, the reverse relation is very weird:

>>> Choice.objects.values_list('things', flat=1)
[456]
>>> Thing.objects.values_list('choices', flat=1)
[456]

Why the inverse relation gives me again the id from the thing? I expected instead the primary key of the choice, equivalent to the following result:

>>> Thing.objects.values_list('choices__pk', flat=1)
[789]

Those ORM queries generate SQL like this:

>>> print Thing.objects.values_list('choices__pk', flat=1).query
SELECT "polls_choice"."id" FROM "polls_thing" LEFT OUTER JOIN "polls_choice" ON ( "polls_thing"."id" = "polls_choice"."object_id" AND ("polls_choice"."content_type_id" = 10))
>>> print Thing.objects.values_list('choices', flat=1).query
SELECT "polls_choice"."object_id" FROM "polls_thing" LEFT OUTER JOIN "polls_choice" ON ( "polls_thing"."id" = "polls_choice"."object_id" AND ("polls_choice"."content_type_id" = 10))

The Django docs are generally excellent, but I can't understand why the second query or find any documentation of that behaviour - it seems to return data from the wrong table completely?

like image 851
wim Avatar asked Mar 22 '16 19:03

wim


2 Answers

TL;DR This was a bug in Django 1.7 that was fixed in Django 1.8.

  • Fix commit: 1c5cbf5e5d5b350f4df4aca6431d46c767d3785a
  • Fix PR: GenericRelation filtering targets related model's pk
  • Bug ticket: Should filter on related model primary key value, not the object_id value

The change went directly to master and did not go under a deprecation period, which isn't too surprising given that maintaining backwards compatibility here would have been really difficult. More surprising is that there was no mention of the issue in the 1.8 release notes, since the fix changes behavior of currently working code.

The remainder of this answer is a description of how I found the commit using git bisect run. It's here for my own reference more than anything, so I can come back here if I ever need to bisect a large project again.


First we setup a django clone and a test project to reproduce the issue. I used virtualenvwrapper here, but you can do the isolation however you wish.

cd /tmp
git clone https://github.com/django/django.git
cd django
git checkout tags/1.7
mkvirtualenv djbisect
export PYTHONPATH=/tmp/django  # get django clone into sys.path
python ./django/bin/django-admin.py startproject djbisect
export PYTHONPATH=$PYTHONPATH:/tmp/django/djbisect  # test project into sys.path
export DJANGO_SETTINGS_MODULE=djbisect.mysettings

create the following file:

# /tmp/django/djbisect/djbisect/models.py
from django.db import models
from django.contrib.contenttypes.models import ContentType
from django.contrib.contenttypes.fields import GenericForeignKey, GenericRelation

class GFKmodel(models.Model):
    content_type = models.ForeignKey(ContentType)
    object_id = models.PositiveIntegerField()
    gfk = GenericForeignKey()

class GRmodel(models.Model):
    related_gfk = GenericRelation(GFKmodel)

also this one:

# /tmp/django/djbisect/djbisect/mysettings.py
from djbisect.settings import *
INSTALLED_APPS += ('djbisect',)

Now we have a working project, create the test_script.py to use with git bisect run:

#!/usr/bin/env python
import subprocess, os, sys

db_fname = '/tmp/django/djbisect/db.sqlite3'
if os.path.exists(db_fname):
    os.unlink(db_fname)

cmd = 'python /tmp/django/djbisect/manage.py migrate --noinput'
subprocess.check_call(cmd.split())

import django
django.setup()

from django.contrib.contenttypes.models import ContentType
from djbisect.models import GFKmodel, GRmodel

ct = ContentType.objects.get_for_model(GRmodel)
y = GRmodel.objects.create(pk=456)
x = GFKmodel.objects.create(pk=789, content_type=ct, object_id=y.pk)

query1 = GRmodel.objects.values_list('related_gfk', flat=1)
query2 = GRmodel.objects.values_list('related_gfk__pk', flat=1)

print(query1)
print(query2)

print(query1.query)
print(query2.query)

if query1[0] == 789 == query2[0]:
    print('FIXED')
    sys.exit(1)
else:
    print('UNFIXED')
    sys.exit(0)

The script must be executable, so add the flag with chmod +x test_script.py. It should be located in the directory that Django is cloned into, i.e. /tmp/django/test_script.py for me. This is because import django should pick up the locally checked-out django project first, not any version from site-packages.

The user interface of git bisect was designed to find out where bugs appeared, so the usual prefixes of "bad" and "good" are backwards when you're trying to find out when a certain bug was fixed. This may seem somewhat upside-down, but the test script should exit with success (return code 0) if the bug is present, and it should fail out (with nonzero return code) if the bug is fixed. This tripped me up a few times!

git bisect start --term-new=fixed --term-old=unfixed
git bisect fixed tags/1.8
git bisect unfixed tags/1.7
git bisect run ./test_script.py

So this process will do an automated search which eventually finds the commit where the bug was fixed. It takes some time, because there were a lot of commits between Django 1.7 and Django 1.8. It bisected 1362 revisions, roughly 10 steps, and eventually output:

1c5cbf5e5d5b350f4df4aca6431d46c767d3785a is the first fixed commit
commit 1c5cbf5e5d5b350f4df4aca6431d46c767d3785a
Author: Anssi Kääriäinen <[email protected]>
Date:   Wed Dec 17 09:47:58 2014 +0200

    Fixed #24002 -- GenericRelation filtering targets related model's pk

    Previously Publisher.objects.filter(book=val) would target
    book.object_id if book is a GenericRelation. This is inconsistent to
    filtering over reverse foreign key relations, where the target is the
    related model's primary key.

That's precisely the commit where the query has changed from the incorrect SQL (which gets data from the wrong table)

SELECT "djbisect_gfkmodel"."object_id" FROM "djbisect_grmodel" LEFT OUTER JOIN "djbisect_gfkmodel" ON ( "djbisect_grmodel"."id" = "djbisect_gfkmodel"."object_id" AND ("djbisect_gfkmodel"."content_type_id" = 8) )

into the correct version:

SELECT "djbisect_gfkmodel"."id" FROM "djbisect_grmodel" LEFT OUTER JOIN "djbisect_gfkmodel" ON ( "djbisect_grmodel"."id" = "djbisect_gfkmodel"."object_id" AND ("djbisect_gfkmodel"."content_type_id" = 8) )

Of course, from the commit hash we're able to find the pull request and the ticket easily on github. Hopefully this may help someone else one day too - bisecting Django can be tricky to setup due to the migrations!

like image 50
wim Avatar answered Oct 22 '22 07:10

wim


Comment - too late for answer - most deleted

A not important result of the backward incompatible fix of issue #24002 is that the GenericRelatedObjectManager (e.g. things) stopped working for a query set long time and it could be used only for filters etc.

>>> choice.things.all()
TypeError: unhashable type: 'GenericRelatedObjectManager'
# originally before 1c5cbf5e5:  [<Thing: Thing object>]

It has been fixed half year later by #24940 in version 1.8.3 and in master branch. The problem was not important because the generic name thing works easier without query (choice.thing) and it is not clear that this usage is documented or undocumented.

docs: Reverse generic relations:

Setting related_query_name creates a relation from the related object back to this one. This allows querying and filtering from the related object.

It would be nice if the specific relation name could be used instead of only the generic. With the example from docs: taged_item.bookmarks is more readable than taged_item.content_object, but it would not be worth of work to implement it.

like image 2
hynekcer Avatar answered Oct 22 '22 05:10

hynekcer