Let's say I have an Author table and a Post table, and each Author can have several Posts.
Now, with a single sqlalchemy query, I want to get all of my active Authors and the most recent published Post for each.
I've been trying to go at this by getting a list of Posts that joinedload the Author, using a subquery to group the results together, like this:
subquery = DBSession.query(Author.id, func.max(Post.publish_date).label("publish_date")) \
    .join(Post.author) \
    .filter(Post.state == 'published') \
    .filter(Author.state == 'active') \
    .group_by(Author.id) \
    .subquery()
query = DBSession.query(Post) \
    .options(joinedload(Post.author)) \
    .join(Post.author) \
    .join(subquery, and_(Author.id == subquery.c.id, 
                         Post.publish_date == subquery.c.publish_date))
But if I have two Posts from an Author with the same publish_date, and those are the newest Posts, that means I get that Author appearing twice in my results list. And while I could use a second subquery to eliminate dupes (take func.max(Post.id)), it seems like really, really the wrong way to go about this. Is there a better way to go about this?
(Again, I'm looking for a single query, so I'm trying to avoid querying on the Author table, then looping through and doing a Post query for every Author in my results.)
Django, Pandas, Entity Framework, peewee, and MySQL are the most popular alternatives and competitors to SQLAlchemy.
The statement ends by calling subquery() , which tells SQLAlchemy that our intention for this query is to use it inside a bigger query instead of on its own.
A scalar subquery is a subquery that selects only one column or expression and returns one row. A scalar subquery can be used anywhere in an SQL query that a column or expression can be used.
I would do it as following:
LastPost = aliased(Post, name='last')
last_id = (
    session.query(LastPost.id)
    .filter(LastPost.author_id == Author.id)
    .order_by(LastPost.publish_date.desc())
    .order_by(LastPost.id.desc())
    .limit(1)
    .correlate(Author)
    .as_scalar()
)
query = (
    DBSession.query(Author, Post)
    .outerjoin(Post, Post.id == last_id)
)
for author, last_post in query:
    print(author, last_post)
As you can see, the result is a tuple of pairs (Author, LastPost).
Change outerjoin to join if you only want authors that have at least one Post.
Also, I do not preload any relationship Author.post to avoid any confusion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With