Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django annotating with a first element of a related queryset

Problem

I am creating a database model for a simple forum. Users should be able to create threads, add posts and post an image with their post.

In a view I would like to display all threads and:

  • get the fields of the first post in the thread to show a part of the post/creation date etc (including an optional image)
  • get the time of the last post in the thread
  • count posts in a thread
  • count the images in a thread

I believe this is not really possible without executing n queries for n threads, so the real question is how to redesign the database to make that possible.

class Thread(models.Model):
    sticky = models.BooleanField()
    ...

class Post(models.Model):
    thread = models.ForeignKey('Thread')
    image = models.OneToOneField('Image', null=True, blank=True, default=None)
    date = models.DateTimeField()
    ...

class Image(models.Model):
    image = models.ImageField(...)
    ...

My partial solution

At this point I know how to count posts and images but I have no idea how to fetch the first post at the same time. I thought about adding additional field in the Thread model linking to the first Post.

My query which forces me to download first posts separately:

Thread.objects.annotate(
    replies=Count('post'),
    images=Count('post__image'),
    last_reply=Max('post_date')
)
like image 959
boreq Avatar asked May 13 '14 11:05

boreq


1 Answers

You can use a Subquery to annotate on a single field from the most recent related object:

comments = Comment.objects.filter(
    post=OuterRef('pk')
).order_by('-timestamp').values('timestamp')
Post.objects.annotate(
    last_comment_time=Subquery(comments[:1])
)

You could annotate on several fields this way, but that would hurt performance (each correlated subquery runs separately, and for each row, which is better than N+1 queries, but worse than a single join).

You can build up a JSON object on a single field, and then annotate that on:

comments = Comment.objects.filter(
    post=OuterRef('pk')
).annotate(
    data=models.expressions.Func(
        models.Value('author'), models.F('author'),
        models.Value('timestamp'), models.F('timestamp'),
        function='jsonb_build_object',
        output_field=JSONField()
    ),
).order_by('-timestamp').values('data')

(It's even possible to get the whole object as JSON, and then re-inflate that in Django, but that's a bit hacky).


Another solution could be to fetch the most recent comments seperately, and then combine them with the posts:

comments = Comment.objects.filter(
    ...
).distinct('post').order_by('post', '-timestamp')
posts = Post.objects.filter(...).order_by('pk')

for post, comment in zip(posts, comments):
    pass

You would need to make sure the posts and comments are in the same order here: these queries are. This would also fail if there was not a comment on each post.

A workaround for that could be to put the comments into a dict keyed by post id, and then fetch the matching one for each post.

comments = {
    comment.post_id: comment
    for comment in Comment.objects.distinct('post').order_by('post', '-timestamp')
}
for post in Post.objects.filter(...):
    top_comment = comments.get(post.pk)
    # whatever
like image 135
Matthew Schinckel Avatar answered Sep 27 '22 20:09

Matthew Schinckel