Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLAlchemy: filter by membership in at least one many-to-many related table

Using SQLAlchemy 0.7.1 and a MySQL 5.1 database, I've got a many-to-many relationship set up as follows:

user_groups = Table('user_groups', Base.metadata,
    Column('user_id', String(128), ForeignKey('users.username')),
    Column('group_id', Integer, ForeignKey('groups.id'))
)

class ZKUser(Base, ZKTableAudit):
    __tablename__ = 'users'

    username   = Column(String(128), primary_key=True)
    first_name = Column(String(512))
    last_name  = Column(String(512))

    groups = relationship(ZKGroup, secondary=user_groups, backref='users')

class ZKGroup(Base, ZKTableAudit):
    __tablename__ = 'groups'

    id          = Column(Integer, primary_key=True)
    name        = Column(String(512))

Users can belong to multiple Groups, and Groups can contain multiple Users.

What I'm trying to do is build a SQLAlchemy query that returns only the Users who belong to at least one Group out of a list of Groups.

I played around with the in_ function, but that only seems to work for testing scalar values for membership in a list. I'm not much of a SQL writer, so I don't even know what kind of SELECT statement this would require.

like image 534
coredumperror Avatar asked Jun 25 '11 00:06

coredumperror


3 Answers

OK, after a lot of research, I realized that it was my own ignorance of SQL terminology that was holding me back. My search for a solution to find users belonging to "at least one of" the list of groups should have been to find users belonging to "any" of the list of groups. The any ORM function from SQLAlchemy does exactly what I needed, like so:

session.query(ZKUser).filter(ZKUser.groups.any(ZKGroup.id.in_([1,2,3])))

That code emits this SQL (on MySQL 5.1):

SELECT * FROM users 
WHERE EXISTS (
    SELECT 1 FROM user_groups, groups 
    WHERE users.id = user_groups.contact_id 
        AND groups.id = user_groups.group_id 
        AND groups.id IN (%s, %s, %s)
    )
like image 134
coredumperror Avatar answered Nov 19 '22 08:11

coredumperror


According to the docs for any, the query will run faster if you use an explicit join instead:

Because any() uses a correlated subquery, its performance is not nearly as good when compared against large target tables as that of using a join.

In your case, you could do something like:

users = (
    session.query(ZKUser)
    .join(user_groups)
    .filter(user_groups.columns.group_id.in_([1, 2, 3]))
)

This emits SQL like:

SELECT *
FROM users
JOIN user_groups ON users.id = user_groups.user_id 
WHERE user_groups.group_id IN (1, 2, 3)
like image 19
z0r Avatar answered Nov 19 '22 10:11

z0r


You can use in_:

session.query(ZKUser).filter(ZKGroup.id.in_([1,2])).all()
like image 2
zeekay Avatar answered Nov 19 '22 08:11

zeekay