Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient Django query to return results spanning multiple tables

Tags:

python

django

I am trying to do a pretty complex query in Django in the most efficient way and I am not sure how to get started. I have these models (this is a simplified version)

class Status(models.Model):
    status = models.CharField(max_length=200)

class User(models.Model):
    name = models.CharField(max_length=200)

class Event(models.Model):
    user = models.ForeignKey(User)

class EventItem(models.Model):
    event = models.ForeignKey(Event)
    rev1 = models.ForeignKey(Status, related_name='rev1', blank=True, null=True)
    rev2 = models.ForeignKey(Status, related_name='rev2', blank=True, null=True)
    active = models.BooleanField()

I want to create a query that will result in a list of Users that have the most events in which all their dependent EventItems have rev1 and rev2 are not blank or nulland active = True.

I know I could do this by iterating through the list of users and then checking all their events for the matching rev1, rev2, and active criteria and then return those events, but this is heavy on the database. Any suggestions?

Thanks!

like image 356
Jono Bacon Avatar asked Jan 14 '23 02:01

Jono Bacon


2 Answers

Your model is broken, but this should sum up what you were doing in a cleaner way.

class Status(models.Model):
    status = models.CharField(max_length=200)

class User(models.Model):
    name = models.CharField(max_length=200)
    events = models.ManyToManyField('Event')

class Event(models.Model):
    rev1 = models.ForeignKey(Status, related_name='rev1', blank=True, null=True)
    rev2 = models.ForeignKey(Status, related_name='rev2', blank=True, null=True)
    active = models.BooleanField()

And the query

User.objects.filter(events__active=True).exclude(Q(events__rev1=None)|Q(events__rev2=None)).annotate(num_events=Count('events')).order_by('-num_events')

This will return a list of users, sorted by the number of events in their set.

For more information check out Many-To-Many fields.

like image 162
Roman Alexander Avatar answered Jan 19 '23 12:01

Roman Alexander


I want to create a query that will result in a list of Users that have the most events in which all their dependent EventItems have rev1 and rev2 are not blank or null and active = True.

First, you want Event objects which always have this type of EventItem.

events = Event.objects.filter(active=True)
events = events.exclude(eventitem__rev1__isnull=True)
events = events.exclude(eventitem__rev1='')
events = events.exclude(eventitem__rev2__isnull=True)
events = events.exclude(eventitem__rev2='')

Also, you didn't specify if you wanted to deal with Event objects that have no EventItem. You can filter those out with:

events = events.exclude(eventitem__isnull=True)

Note that events may contain plenty of duplicates. You can throw in an events.distinct() if you like, but should only do that if you need it human-readable.

Once you have those, you can now extract the User objects that you want:

users = User.objects.filter(event__in=events)

Note that on certain database backends, *ahem* MySQL *ahem*, you may find that the .filter(field__in=QuerySet) pattern is really slow. For that case, the code should be:

users = User.objects.filter(event__in=list(events.values_list('pk', flat=True)))

You may then order things by the number of Event objects attached:

from django.db.models import Count
active_users = users.annotate(num_events=Count('event')).order_by('-num_events')
like image 38
Simon Law Avatar answered Jan 19 '23 11:01

Simon Law