I have a queryset that returns a lot of data, it can be filtered by year which will return around 100k lines, or show all which will bring around 1 million lines.
The objective of this annotate is to generate a xlsx spreadsheet.
Models representation, RelatedModel
is manytomany between Model
and AnotherModel
Model:
id
field1
field2
field3
RelatedModel:
foreign_key_model (Model)
foreign_key_another (AnotherModel)
Queryset, if the relation exists it will annotate, this annotate is very slow and can take several minutes.
Model.objects.all().annotate(
related_exists=Exists(RelatedModel.objects.filter(foreign_key_model=OuterRef('id'))),
related_column=Case(
When(related_exists=True, then=Value('The relation exists!')),
When(related_exists=False, then=Value('The relation doesn't exist!')),
default=Value('This is the default value!'),
output_field=CharField(),
)
).values_list(
'related_column',
'field1',
'field2',
'field3'
)
This is because a Django QuerySet is a lazy object. It contains all of the information it needs to populate itself from the database, but will not actually do so until the information is needed.
In the Django framework, both annotate and aggregate are responsible for identifying a given value set summary. Among these, annotate identifies the summary from each of the items in the queryset. Whereas in the case of aggregate, the summary is calculated for the entire queryset.
In the Django QuerySet API, F() expressions are used to refer to model field values directly in the database.
Django's ORM is fantastic. It's slow because it chooses to be convenient but if it needs to be fast it's just a few slight API calls away. If you're curious, check out the code on Github.
If only thing needed is to change how True / False is displayed in xlsx - one option is to just have one related_exists
BooleanField annotation and later customize how it will be converted when creating xlsx document - i.e. in serializer. Database should store raw / unformatted values, and app prepare them to be shown to user.
Other things to consider:
But let's look into raw sql of original query. It will be like this:
SELECT [model_fields],
EXISTS([CLIENT_SELECT]) AS related_exists,
CASE
WHEN EXISTS([CLIENT_SELECT]) = true THEN 'The relation exists!'
WHEN EXISTS([CLIENT_SELECT]) = true THEN 'The relation does not exist!'
ELSE 'The relation exists!'
END AS related_column
FROM model;
And right away we can see nested query for Exists CLIENT_SELECT is there 3 times. Even though it is exactly the same, it may be executed minimum 2 times and up to 3 times. Database may optimize it to be faster than 3x, but it still is not optimal as 1x.
First, EXISTS
returns either True or False, we can leave just one check that it is True, making 'The relation does not exist!'
the default value.
related_column=Case(
When(related_exists=True, then=Value('The relation exists!')),
default=Value('The relation does not exist!')
Why related_column
performs same select again and not takes the value of related_exists
?
Because we cannot reference calculated columns while calculating another columns - and this is database level constraint django knows about and duplicates expression.
Wait, then we actually do not need related_exists
column, lets just leave related_column
with CASE statement and 1 exists subquery.
Here comes Django - we cannot (till 3.0) use expressions in filters without annotating them first.
So, it our case it is like: in order to use Exist
in When
, we first need to add it as annotation, but it won't be used as a reference, but a full copy of expression.
Good news!
Since Django 3.0 we can use expressions that output BooleanField directly in QuerySet filters, without having to first annotate. Exists
is one of such BooleaField expressions.
Model.objects.all().annotate(
related_column=Case(
When(
Exists(RelatedModel.objects.filter(foreign_key_model=OuterRef('id'))),
then=Value('The relation exists!'),
),
default=Value('The relation doesn't exist!'),
output_field=CharField(),
)
)
And only one nested select, and one annotated field.
Django 2.1, 2.2
Here's the commit that finalized allowance of boolean expressions although many pre-conditions for it were added earlier. One of them is presence of conditional
attribute on expression object and check for this attribute.
So, although not recommended and not tested it seems quite working little hack for Django 2.1, 2.2 (before there was no conditional
check, and it will require more intrusive changes):
Exists
expression instanceconditional = True
When
statementrelated_model_exists = Exists(RelatedModel.objects.filter(foreign_key_model=OuterRef('id')))
setattr(related_model_exists, 'conditional', True)
Model.objects.all().annotate(
related_column=Case(
When(
relate_model_exists,
then=Value('The relation exists!'),
),
default=Value('The relation doesn't exist!'),
output_field=CharField(),
)
)
Related checks
relatedmodel_set__isnull=True
check is not suitable for several reasons:
LEFT OUTER JOIN
- that is less efficient than EXISTS
LEFT OUTER JOIN
- it joins tables, this makes it ONLY suitable in filter() condition (not in annotate - When), and only for OneToOne or OneToMany (One is on relatedmodel side) relationsYou can considerably simplify your query to:
from django.db.models import Count
Model.objects.all().annotate(
related_column=Case(
When(relatedmodel_set__isnull=True, then=Value("The relation doesn't exist!")),
default=Value("The relation exists!"),
output_field=CharField()
)
)
Where relatedmodel_set
is the related_name
on your foreign key.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With