I'm trying to optimize database queries for a Django app. Here's a simplified example:
class Label(models.Model):
name = models.CharField(max_length=200)
# ... many other fields ...
class Thing(models.Model):
name = models.CharField(max_length=200)
labels = models.ManyToManyField(Label)
I have a function that fetches all Label
s and Thing
s and puts them into a JSON data structure, in which Thing
s refer to Label
s using their id
s (primary keys). Something like this:
{
'labels': [
{ 'id': 123, 'name': 'label foo' },
...
],
'things': [
{ 'id': 45, 'name': 'thing bar', 'labels': [ 123, ... ] },
...
]
}
What is the most efficient way of obtaining such a data structure using Django? Suppose I have L Label
s and T Thing
s, and the average Thing
has x Label
s.
Method 1:
data = {}
data['labels'] = [model_to_dict(label) for label in Label.objects.all()]
data['things'] = [model_to_dict(thing) for thing in Thing.objects.all()]
This makes (1 + 1 + T) database queries, since model_to_dict(thing)
needs to fetch the Label
s for each Thing
individually.
Method 2:
data = {}
data['labels'] = [model_to_dict(label) for label in Label.objects.all()]
data['things'] = [model_to_dict(thing) for thing in
Thing.objects.prefetch_related('labels').all()]
This makes (1 + 1 + 1) database queries only, since the Thing
s fetched now have their Label
s prefetched in a single additional query.
This is still not satisfactory. prefetch_related('labels')
will fetch many copies of the same Label
, whereas I only need their id
s. Is there any way to prefetch the id
s of the Label
s only? I tried prefetch_related('labels__id')
but that didn't work. I am also concerned that because T is large (hundreds), prefetch_related('labels')
results in a SQL query with a large IN
clause. L is much smaller (< 10), so I could do this instead:
Method 3:
data = {}
data['labels'] = [model_to_dict(label) for label in
Label.objects.prefetch_related('thing_set').all()]
things = list(Thing.objects.all())
# plug in label ids by hand, and also fetch things that have zero labels
# somehow
This results in a smaller IN
clause, but is still not satisfactory because prefetch_related('thing_set')
fetches duplicate Thing
s, if a Thing
has multiple Label
s.
Summary:
Label
and Thing
are connected by a ManyToManyField
. I am fetching all Label
s and Thing
s anyway. So how do I also fetch their many-to-many relationship efficiently?
I got it. Thanks to ilvar, whose comment to the question pointed me to through
tables.
If you don’t specify an explicit through model, there is still an implicit through model class you can use to directly access the table created to hold the association. It has three fields to link the models.
Long story short:
# Fetch all labels and things:
labels = list(Label.objects.all())
things = list(Thing.objects.all())
# Fetch all label-thing pairs:
labels_of = defaultdict(lambda: [])
for pair in Thing.labels.through.objects.filter(label__in=labels):
labels_of[pair.thing_id].append(pair.label_id)
# Put everything together:
data = {}
data['labels'] = [model_to_dict(label) for label in labels]
data['things'] = []
for thing in things:
thing_dict = model_to_dict(thing, exclude='labels')
thing_dict['labels'] = labels_of[thing.id]
data['things'].append(thing_dict)
This makes (1 + 1 + 1) queries, and does not fetch anything repeatedly. I can also change the first for loop to:
for pair in Thing.labels.through.objects.filter(thing__in=things):
in case I have more Label
s than Thing
s, which will result in a query with a smaller IN
clause.
Django-debug-toolbar's debugsqlshell
management command is superb for actually seeing the queries that a piece of code is making.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With