Let's say I have an Author table and a Post table, and each Author can have several Posts.
Now, with a single sqlalchemy query, I want to get all of my active Authors and the most recent published Post for each.
I've been trying to go at this by getting a list of Posts that joinedload the Author, using a subquery to group the results together, like this:
subquery = DBSession.query(Author.id, func.max(Post.publish_date).label("publish_date")) \
.join(Post.author) \
.filter(Post.state == 'published') \
.filter(Author.state == 'active') \
.group_by(Author.id) \
.subquery()
query = DBSession.query(Post) \
.options(joinedload(Post.author)) \
.join(Post.author) \
.join(subquery, and_(Author.id == subquery.c.id,
Post.publish_date == subquery.c.publish_date))
But if I have two Posts from an Author with the same publish_date, and those are the newest Posts, that means I get that Author appearing twice in my results list. And while I could use a second subquery to eliminate dupes (take func.max(Post.id)), it seems like really, really the wrong way to go about this. Is there a better way to go about this?
(Again, I'm looking for a single query, so I'm trying to avoid querying on the Author table, then looping through and doing a Post query for every Author in my results.)
Django, Pandas, Entity Framework, peewee, and MySQL are the most popular alternatives and competitors to SQLAlchemy.
The statement ends by calling subquery() , which tells SQLAlchemy that our intention for this query is to use it inside a bigger query instead of on its own.
A scalar subquery is a subquery that selects only one column or expression and returns one row. A scalar subquery can be used anywhere in an SQL query that a column or expression can be used.
I would do it as following:
LastPost = aliased(Post, name='last')
last_id = (
session.query(LastPost.id)
.filter(LastPost.author_id == Author.id)
.order_by(LastPost.publish_date.desc())
.order_by(LastPost.id.desc())
.limit(1)
.correlate(Author)
.as_scalar()
)
query = (
DBSession.query(Author, Post)
.outerjoin(Post, Post.id == last_id)
)
for author, last_post in query:
print(author, last_post)
As you can see, the result is a tuple
of pairs (Author, LastPost)
.
Change outerjoin
to join
if you only want authors that have at least one Post
.
Also, I do not preload any relationship Author.post
to avoid any confusion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With