I have the following tables in sqlalchemy :-
class Post(Base):
__tablename__ = 'posts'
id = Column(Integer, primary_key=True)
compare_url =Column(String(200))
url = Column(String(200))
postedby = Column(Integer)
category = Column(String(50))
title = Column(String(500),nullable=False)
author = Column(String(500),default="Unspecified")
content = Column(Text(),default="could not fetch this content you will have to read it externally")
summary = Column(Text())
time = Column(TIMESTAMP(),default=now())
post_type=Column(Text())
Reads = relationship("Read", backref="Post")
Reposts = relationship("RePost", backref="Post")
Votes = relationship("Vote", backref="Post")
class Read(Base):
__tablename__ = 'reads'
id = Column(Integer, primary_key=True)
post_read = Column(Integer, ForeignKey('posts.id'))
#post = relationship("Post", backref=backref('Reads', order_by=id))
time = Column(TIMESTAMP(),default=now())
user_id = Column(String(50))
class Vote(Base):
__tablename__ = 'votes'
id = Column(Integer, primary_key=True)
post_read = Column(Integer, ForeignKey('posts.id'))
time = Column(TIMESTAMP(),default=now())
user_id = Column(String(50))
user_vote = Column(Boolean(),nullable=False)
I have this query
posts = session.query(Post, func.count(Read.id).label('total'),func.sum(Vote.user_vote).label('votes'),User.username).outerjoin(Post.Reads).outerjoin(Post.Votes)
i am trying to get the number of votes and the number of times a post has been read. A vote Value can be -1 or 1
The problem is i am getting the same value for number of Reads and votes on each Post
for example when my reads table has
id post_read time user_id
1 7 2012-09-19 09:32:06 1
and votes table has
id post_read time user_id user_vote
1 7 [->] 2012-09-19 09:42:27 1 1
2 7 [->] 2012-09-19 09:42:27 2 1
But i am still getting the value for votes and reads as two.
It might look as if you can solve this particular problem by simply replacing func.count(Read.id).label('total') with func.count(func.distinct(Read.id)).label('total'). And in fact this will solve the issue with number of reads.
But if you suddenly get another reader for your post (and end up with 2 readers and 2 voters), then all your votes will also be counted twice.
The best solution to this is simply not to aggreate different items in the same query. You can use subqueries to solve this:
subq_read = (session.query(
Post.id,
func.count(Read.id).label("total_read")
).
outerjoin(Post.Reads).
group_by(Read.post_read)
).subquery()
subq_vote = (session.query(
Post.id,
func.sum(Vote.user_vote).label("total_votes")
).
outerjoin(Post.Votes).
group_by(Vote.post_read)
).subquery()
posts = (session.query(
Post,
subq_read.c.total_read,
subq_vote.c.total_votes,
).
outerjoin(subq_read, subq_read.c.id == Post.id).
outerjoin(subq_vote, subq_vote.c.id == Post.id)
.group_by(Post)
)
Note: you have a User.username in your query, but I did not see any join clause in the query. You might want to check this as well.
When joining multiple tables, tables that join earlier get their rows repeated for tables that join later in one-to-many relationships (to put it simply). This is why your count is off. In joins like this, you always need to find something distinct to count in the result set... such as the primary keys. I find this preferable to subqueries as it is much faster. In fact, much of the performance tuning I do comes from eliminating subqueries.
Thus, if you filter on the user_vote column to eliminate the records you don't want to count, you can fix your query like this:
posts = session.query(Post
, func.count(distinct(Read.id)).label('total')
, func.count(distinct(Vote.id)).label('votes')
, User.username
) \
.outerjoin(Post.Reads) \
.outerjoin(Post.Votes) \
.filter(Votes.user_vote == True)
But, you'll probably also want to add a group_by, or another filter, to that as well to get counts per Post, your likely goal.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With