I'm trying to have my popular_query subquery remove dupe Place.id, but it doesn't remove it. This is the code below. I tried using distinct but it does not respect the order_by rule.
SimilarPost = aliased(Post)
SimilarPostOption = aliased(PostOption)
popular_query = (db.session.query(Post, func.count(SimilarPost.id)).
join(Place, Place.id == Post.place_id).
join(PostOption, PostOption.post_id == Post.id).
outerjoin(SimilarPostOption, PostOption.val == SimilarPostOption.val).
join(SimilarPost,SimilarPost.id == SimilarPostOption.post_id).
filter(Place.id == Post.place_id).
filter(self.radius_cond()).
group_by(Post.id).
group_by(Place.id).
order_by(desc(func.count(SimilarPost.id))).
order_by(desc(Post.timestamp))
).subquery().select()
all_posts = db.session.query(Post).select_from(filter.pick()).all()
I did a test printout with
print [x.place.name for x in all_posts]
[u'placeB', u'placeB', u'placeB', u'placeC', u'placeC', u'placeA']
How can I fix this?
Thanks!
This should get you what you want:
SimilarPost = aliased(Post)
SimilarPostOption = aliased(PostOption)
post_popularity = (db.session.query(func.count(SimilarPost.id))
.select_from(PostOption)
.filter(PostOption.post_id == Post.id)
.correlate(Post)
.outerjoin(SimilarPostOption, PostOption.val == SimilarPostOption.val)
.join(SimilarPost, sql.and_(
SimilarPost.id == SimilarPostOption.post_id,
SimilarPost.place_id == Post.place_id)
)
.as_scalar())
popular_post_id = (db.session.query(Post.id)
.filter(Post.place_id == Place.id)
.correlate(Place)
.order_by(post_popularity.desc())
.limit(1)
.as_scalar())
deduped_posts = (db.session.query(Post, post_popularity)
.join(Place)
.filter(Post.id == popular_post_id)
.order_by(post_popularity.desc(), Post.timestamp.desc())
.all())
I can't speak to the runtime performance with large data sets, and there may be a better solution, but that's what I managed to synthesize from quite a few sources (MySQL JOIN with LIMIT 1 on joined table, SQLAlchemy - subquery in a WHERE clause, SQLAlchemy Query documentation). The biggest complicating factor is that you apparently need to use as_scalar
to nest the subqueries in the right places, and therefore can't return both the Post id and the count from the same subquery.
FWIW, this is kind of a behemoth and I concur with user1675804 that SQLAlchemy code this deep is hard to grok and not very maintainable. You should take a hard look at any more low-tech solutions available like adding columns to the db or doing more of the work in python code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With