I have a QuerySet
of some objects. For each one, I wish to annotate with the minimum value of a related model (joined on a few conditions, ordered by date). I can express my desired results neatly in SQL, but am curious how to translate to Django's ORM.
Let's say that I have two related models: Book
, and BlogPost
, each with a foreign key to an Author
:
class Book(models.Model):
title = models.CharField(max_length=255)
genre = models.CharField(max_length=63)
author = models.ForeignKey(Author)
date_published = models.DateField()
class BlogPost(models.Model):
author = models.ForeignKey(Author)
date_published = models.DateField()
I'm trying to find the first mystery book that a given author published after each blog post that they write. In SQL, this can be achieved nicely with windowing.
WITH ordered AS (
SELECT blog_post.id,
book.title,
ROW_NUMBER() OVER (
PARTITION BY blog_post.id ORDER BY book.date_published
) AS rn
FROM blog_post
LEFT JOIN book ON book.author_id = blog_post.author_id
AND book.genre = 'mystery'
AND book.date_published >= blog_post.date_published
)
SELECT id,
title
FROM ordered
WHERE rn = 1;
While the above SQL suits my needs well (and I could use raw SQL if needed), I'm curious as to how one would do this in QuerySet. I have an existing QuerySet where I'd like to annotate it even further
books = models.Book.objects.filter(...).select_related(...).prefetch_related(...)
annotated_books = books.annotate(
most_recent_title=...
)
I'm aware that Django 2.0 supports window functions, but I'm on Django 1.10 for now.
I'd first built a Q
object to filter down to mystery books published after the blog post.
published_after = Q(
author__book__date_published__gte=F('date_published'),
author__book__genre='mystery'
)
From here, I attempted to piece together django.db.models.Min
and additional F
objects to acheive my desired results, but with no success.
Note: Django 2.0 introduces window expressions, but I'm currently on Django 1.10, and curious how one would do this with the QuerySet features available there.
Perhaps using .raw
isn't such a bad idea. Checking the code for Window
class we can see that essentially composes an SQL query to achieve the "Windowing".
An easy way out may be the usage of the architect module which can add partition functionality for PostgreSQL according to the documentation.
Another module that claims to inject Window functionality to Django < 2.0 is the django-query-builder which adds a partition_by()
queryset method and can be used with order_by
:
query = Query().from_table( Order, ['*', RowNumberField( 'revenue', over=QueryWindow().order_by('margin') .partition_by('account_id') ) ] ) query.get_sql() # SELECT tests_order.*, ROW_NUMBER() OVER (PARTITION BY account_id ORDER BY margin ASC) AS revenue_row_number # FROM tests_order
Finally, you can always copy the Window
class source code in your project or use this alternate Window class code.
Your apparent problem is that Django 1.10 is too old to handle window functions properly (which have been around for a very long time already).
That problem goes away if you rewrite your query without window function.
Which of them is fastest depends on available indexes and data distribution. But each of them should be faster than your original.
1. With DISTINCT ON
:
SELECT DISTINCT ON (p.id)
p.id, b.title
FROM blog_post p
LEFT JOIN book b ON b.author_id = p.author_id
AND b.genre = 'mystery'
AND b.date_published >= p.date_published
ORDER BY p.id, b.date_published;
Related, with detailed explanation:
2. With a LATERAL
subquery (requires Postgres 9.3 or later):
SELECT p.id, b.title
FROM blog_post p
LEFT JOIN LATERAL (
SELECT title
FROM book
WHERE author_id = p.author_id
AND genre = 'mystery'
AND date_published >= p.date_published
ORDER BY date_published
LIMIT 1
) b ON true;
-- ORDER BY p.id -- optional
Related, with detailed explanation:
3. Or simpler, yet, with a correlated subquery:
SELECT p.id
,(SELECT title
FROM book
WHERE author_id = p.author_id
AND genre = 'mystery'
AND date_published >= p.date_published
ORDER BY date_published
LIMIT 1)
FROM blog_post p;
-- ORDER BY p.id -- optional
Each should be translated easily to Django syntax. You might also just use the raw SQL, that's what is sent to the Postgres server anyway.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With