Given a model
class Entity(models.Model):
identifier = models.IntegerField()
created = models.IntegerField()
content = models.IntegerField()
class Meta:
unique_together = (('identifier', 'created'))
I would like to query for all objects with created
being maximal among objects with common identifier
.
In SQL a window function in a subquery solves the problem:
SELECT identifier, content
FROM entity
WHERE (identifier, created)
IN (SELECT identifier, max(created) OVER (PARTITION BY identifier)
FROM entity);
See also: http://sqlfiddle.com/#!17/c541f/1/0
Both window functions and subqueries are available in Django 2.0. However, I have not found a way to express subquery expressions with multiple columns.
Is there a way to translate that SQL query into the Django QuerySet world? Is this maybe a an XY problem and my problem can be solved differently?
My ugly workaround is
Entity.objects.raw('''
SELECT * FROM app_entity e
WHERE e.created = (SELECT max(f.created) FROM app_entity f WHERE e.identifier = f.identifier)''')
since the underlying sqlite3 version apparently cannot handle multi-column subqueries.
Django allows using SQL subqueries.
Use union operator for queryset | to take union of two queryset. If both queryset belongs to same model / single model than it is possible to combine querysets by using union operator. One other way to achieve combine operation between two queryset is to use itertools chain function.
OuterRef: It acts like an F expression except that the check to see if it refers to a valid field isn't made until the outer queryset is resolved. I am experiencing an issue with this using the following example: class ExampleModel(models.Model): date = models.DateField()
In the Django QuerySet API, F() expressions are used to refer to model field values directly in the database.
I think you can do it another way (but I'm not sure if it will perform better or worse than a window expression)...
max_created = Entity.objects.filter(
identifier=OuterRef('identifier')
).order_by('-created').values('created')[:1]
Entity.objects.filter(
created=Subquery(max_created)
)
This grabs the largest created
value for a given identifier
, as a correlated subquery, and then filters for only those that match.
This may need tweaking: I'm not sure if you can filter on the subquery like that, or if you need to .annotate(max_created=Subquery(created)).filter(created=F('max_created'))
or something else horrible like that.
Also, if you are on postgres, you can use the DISTINCT ON
feature to get a really neat solution:
Entity.objects.order_by('identifier', '-created').distinct('identifier')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With