I have a database relationship with a Many-To-Many association but the association table itself contains a lot of attributes that need to be accessed, so I made three classes:
class User(Base):
id = Column(Integer, primary_key=True)
attempts = relationship("UserAttempt", backref="user", lazy="subquery")
class Challenge(Base):
id = Column(Integer, primary_key=True)
attempts = relationship("UserAttempt", backref="challenge", lazy='subquery')
class UserAttempt(Base):
challenge_id = Column(Integer, ForeignKey('challenge.id'), primary_key=True)
user_id = Column(Integer, ForeignKey('user.id'), primary_key=True)
This is a simplified case, of course, where I left out the other attributes that I need to access. The purpose here is, that each User
can attempt any number of Challenge
s, hence the UserAttempt
table which described one particular user working one challenge.
The problem now: When I query for all Users and then look at each attempt, I am perfectly fine. But when I look at the challenge for this attempt, it explodes in numerous subqueries. Of course, this is bad for performance.
What I actually want from SQLAlchemy is to pull all (or all relevant) Challenges at once and then associate it with the relevant attempts. It is not a big deal if all challenges are pulled or only does which have an actual association later, as this the number of challenges is only between 100-500.
My solution right now is actually not very elegant: I pull all relevant attempts, challenges and users seperately and then associate by hand: Loop through all attempts and assign add to the challenge & user, then add the challenge & user to the attempt as well. That seems to me like a brutal solution that should not be necessary.
However, every approach (e.g. varying "lazy" parameters, altered queries, etc.) have led to queries from hundreds to thousands. I have also tried to write plain SQL
queries that would yield my desired results and have come up with something along the lines of SELECT * FROM challenge WHERE id IN (SELECT challenge_id FROM attempts)
and that worked well, but I cannot get it translated to SQLAlchemy
Thank you very much in advance for any guidance you may have to offer.
What I actually want from SQLAlchemy is to pull all (or all relevant) Challenges at once and then associate it with the relevant attempts. It is not a big deal if all challenges are pulled or only does which have an actual association later,
You first want to take off that "lazy='subquery'" directive from relationship() first; fixing relationships to always load everything is why you're getting the explosion of queries. Specifically here, you're getting that Challenge->attempts eagerload exactly for each lazyload of UserAttempt->Challenge so you've sort of designed the worst possible loading combination here :).
With that fixed, there's two approaches.
One is to keep in mind that many-to-one association in the usual case is fetched from the Session in memory first by primary key, and if present, no SQL is emitted. So I think you could get exactly the effect it seems like you're describing using a technique I use often:
all_challenges = session.query(Challenge).all()
for user in some_users: # however you got these
for attempt in user.attempts: # however you got these
do_something_with(attempt.challenge) # no SQL will be emitted
If you wanted to use the above approach with exactly the "Select * from challenge where id in (select challenge_id from attempt)":
all_challenges = session.query(Challenge).\
filter(Challenge.id.in_(session.query(UserAttempt.challenge_id))).all()
though this is likely more efficient as a JOIN:
all_challenges = session.query(Challenge).\
join(Challenge.attempts).all()
or DISTINCT, I guess the join would return the same challenge.id as it appears in UserAttempt:
all_challenges = session.query(Challenge).distinct().\
join(Challenge.attempts).all()
The other way is to use eager loading more specifically. you can query for a bunch of users/attempts/challenges within one query that will emit three SELECT statements:
users = session.query(User).\
options(subqueryload_all(User.attempts, UserAttempt.challenge)).all()
or because UserAttempt->Challenge is many-to-one, a join might be better:
users = session.query(User).\
options(subqueryload(User.attempts), joinedload(UserAttempt.challenge)).all()
just from UserAttempt:
attempts = session.query(UserAttempt).\
options(joinedload(UserAttempt.challenge)).all()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With