Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sqlalchemy using func with outerjoin on multiple tables

I have the following tables in sqlalchemy :-

class Post(Base):
    __tablename__ = 'posts'
    id = Column(Integer, primary_key=True)
    compare_url =Column(String(200))
    url = Column(String(200))
    postedby = Column(Integer)
    category = Column(String(50))
    title  = Column(String(500),nullable=False)
    author  = Column(String(500),default="Unspecified")
    content = Column(Text(),default="could not fetch this content you will have to read it externally")
    summary = Column(Text())
    time = Column(TIMESTAMP(),default=now())
    post_type=Column(Text())
    Reads = relationship("Read", backref="Post")
    Reposts = relationship("RePost", backref="Post")
    Votes = relationship("Vote", backref="Post")



class Read(Base):
    __tablename__ = 'reads'
    id = Column(Integer, primary_key=True)
    post_read = Column(Integer, ForeignKey('posts.id'))
    #post = relationship("Post", backref=backref('Reads', order_by=id))
    time = Column(TIMESTAMP(),default=now())
    user_id = Column(String(50))


class Vote(Base):
    __tablename__ = 'votes'
    id = Column(Integer, primary_key=True)
    post_read = Column(Integer, ForeignKey('posts.id'))
    time = Column(TIMESTAMP(),default=now())
    user_id = Column(String(50))
    user_vote = Column(Boolean(),nullable=False)

I have this query

posts = session.query(Post, func.count(Read.id).label('total'),func.sum(Vote.user_vote).label('votes'),User.username).outerjoin(Post.Reads).outerjoin(Post.Votes)

i am trying to get the number of votes and the number of times a post has been read. A vote Value can be -1 or 1

The problem is i am getting the same value for number of Reads and votes on each Post

for example when my reads table has

id  post_read   time             user_id
1   7       2012-09-19 09:32:06  1

and votes table has

id  post_read   time                 user_id    user_vote
1   7 [->]         2012-09-19 09:42:27  1   1
2   7 [->]         2012-09-19 09:42:27  2   1

But i am still getting the value for votes and reads as two.

like image 323
Madawar Avatar asked Jun 28 '26 23:06

Madawar


2 Answers

It might look as if you can solve this particular problem by simply replacing func.count(Read.id).label('total') with func.count(func.distinct(Read.id)).label('total'). And in fact this will solve the issue with number of reads.

But if you suddenly get another reader for your post (and end up with 2 readers and 2 voters), then all your votes will also be counted twice.

The best solution to this is simply not to aggreate different items in the same query. You can use subqueries to solve this:

subq_read = (session.query(
                Post.id, 
                func.count(Read.id).label("total_read")
            ).
            outerjoin(Post.Reads).
            group_by(Read.post_read)
            ).subquery()

subq_vote = (session.query(
                Post.id, 
                func.sum(Vote.user_vote).label("total_votes")
            ).
            outerjoin(Post.Votes).
            group_by(Vote.post_read)
            ).subquery()

posts = (session.query(
            Post, 
            subq_read.c.total_read,
            subq_vote.c.total_votes,
        ).
        outerjoin(subq_read, subq_read.c.id == Post.id).
        outerjoin(subq_vote, subq_vote.c.id == Post.id)
        .group_by(Post)
        )

Note: you have a User.username in your query, but I did not see any join clause in the query. You might want to check this as well.

like image 127
van Avatar answered Jun 30 '26 12:06

van


When joining multiple tables, tables that join earlier get their rows repeated for tables that join later in one-to-many relationships (to put it simply). This is why your count is off. In joins like this, you always need to find something distinct to count in the result set... such as the primary keys. I find this preferable to subqueries as it is much faster. In fact, much of the performance tuning I do comes from eliminating subqueries.

Thus, if you filter on the user_vote column to eliminate the records you don't want to count, you can fix your query like this:

posts = session.query(Post
    ,   func.count(distinct(Read.id)).label('total')
    ,   func.count(distinct(Vote.id)).label('votes')
    ,   User.username
    ) \
    .outerjoin(Post.Reads) \
    .outerjoin(Post.Votes) \
    .filter(Votes.user_vote == True)

But, you'll probably also want to add a group_by, or another filter, to that as well to get counts per Post, your likely goal.

like image 43
juanitogan Avatar answered Jun 30 '26 11:06

juanitogan