Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Preventing O(n) Queries with Intermediary Models

I have the following models:

class Artist(models.Model):
    name = models.CharField()

    def primary_group(self):
        return self.memberships.select_related('group').get(is_primary=True)

class Group(models.Model):
    name = models.CharField()
    members = models.ManyToManyField(Artist, through='Membership')

class Membership(models.Model):
    artist = models.ForeignKey(Artist, related_name='memberships')
    group = models.ForeignKey(Group)
    is_primary = models.BooleanField()

Artist and Group are linked via an intermediary model, Membership. Artists can only have one primary group, which is marked via is_primary, validated, etc.

In a template where I list artists, I list basic artist information in addition to their primary group, called by the method above. However, that's an O(n) operation and I have around 160 artists to do this to. The SQL that the django-debug-toolbar provides is as follows:

SELECT ••• FROM "people_membership" 
           LEFT OUTER JOIN "people_group" ON ("people_membership"."group_id" = "people_group"."id") 
           WHERE ("people_membership"."artist_id" = xx AND "people_membership"."is_primary" = true )

Let me add that this happens for every artist listed, so I get about 160 of these.

Is O(n) the best that can be done, considering that I call a model method? Or is there something else I can do to improve this (short of denormalizing primary_group)? This seems to be a problem with any sort of information that is stored in an intermediary model that I'd like to call from either the source or the target.

like image 728
Bryan Veloso Avatar asked Nov 29 '12 02:11

Bryan Veloso


2 Answers

I would do it like David Cramer says but instead of extra:

primary_memberships = {m.artist_id: m.group for m in Membership.objects.filter(group__isprimary=True, artist__in=artists).select_related('group')}
for artist in artists:
    artists.primary_membership = primary_memberships.get(artist.id)

For bonus points make this a method on membership's manager so you can apply it to any list of artists easily!

like image 42
Alex Gaynor Avatar answered Oct 20 '22 06:10

Alex Gaynor


You can easily do this with two queries, which despite what any haters will say, doesn't matter at all:

artists = list(Artist.objects.all())
primary_memberships = {m.artist_id: m for m in Group.objects.filter(is_primary=True, membership__artist__in=artists).extra(select={'artist_id': '%s.artist_id' % (Membership._meta.db_table,)})}
for artist in artists:
    artist.primary_membership = primary_memberships.get(artist.id)

(The extra clause may not be correct, but you get the idea)

In addition to this, I would change the primary function to do like:

if hasattr(self, '_primary_membership_cache'):
    return self._primary_membership_cache

And then if you attach the information, bind it to that variable, and just use your same function call.

(We follow this kind of pattern all over the place at DISQUS for various joins/odd queries)

like image 70
David Cramer Avatar answered Oct 20 '22 06:10

David Cramer