Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django QuerySet Two-Valued Subquery

Tags:

Given a model

class Entity(models.Model):
    identifier = models.IntegerField()
    created = models.IntegerField()
    content = models.IntegerField()

    class Meta:
        unique_together = (('identifier', 'created'))

I would like to query for all objects with created being maximal among objects with common identifier.

In SQL a window function in a subquery solves the problem:

SELECT identifier, content
  FROM entity
  WHERE (identifier, created)
    IN (SELECT identifier, max(created) OVER (PARTITION BY identifier)
          FROM entity);

See also: http://sqlfiddle.com/#!17/c541f/1/0

Both window functions and subqueries are available in Django 2.0. However, I have not found a way to express subquery expressions with multiple columns.

Is there a way to translate that SQL query into the Django QuerySet world? Is this maybe a an XY problem and my problem can be solved differently?

My ugly workaround is

Entity.objects.raw('''
SELECT * FROM app_entity e
 WHERE e.created = (SELECT max(f.created) FROM app_entity f WHERE e.identifier = f.identifier)''')

since the underlying sqlite3 version apparently cannot handle multi-column subqueries.

like image 576
Philipp Matthias Schäfer Avatar asked May 18 '18 08:05

Philipp Matthias Schäfer


People also ask

Does Django ORM support subquery?

Django allows using SQL subqueries.

How do I add one Queryset to another?

Use union operator for queryset | to take union of two queryset. If both queryset belongs to same model / single model than it is possible to combine querysets by using union operator. One other way to achieve combine operation between two queryset is to use itertools chain function.

What is OuterRef?

OuterRef: It acts like an F expression except that the check to see if it refers to a valid field isn't made until the outer queryset is resolved. I am experiencing an issue with this using the following example: class ExampleModel(models.Model): date = models.DateField()

What is F in Django Queryset?

In the Django QuerySet API, F() expressions are used to refer to model field values directly in the database.


1 Answers

I think you can do it another way (but I'm not sure if it will perform better or worse than a window expression)...

max_created = Entity.objects.filter(
    identifier=OuterRef('identifier')
).order_by('-created').values('created')[:1]

Entity.objects.filter(
    created=Subquery(max_created)
)

This grabs the largest created value for a given identifier, as a correlated subquery, and then filters for only those that match.

This may need tweaking: I'm not sure if you can filter on the subquery like that, or if you need to .annotate(max_created=Subquery(created)).filter(created=F('max_created')) or something else horrible like that.

Also, if you are on postgres, you can use the DISTINCT ON feature to get a really neat solution:

Entity.objects.order_by('identifier', '-created').distinct('identifier')
like image 99
Matthew Schinckel Avatar answered Oct 11 '22 16:10

Matthew Schinckel