Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLAlchemy filter on list attribute

I have the following model defined with Flask-SQLAlchemy:

"""models.py"""

from flask_sqlalchemy import SQLAlchemy

db = SQLAlchemy()

skill_candidate = db.Table(
    'SkillCandidate',
    db.Column('skill_id', db.String, db.ForeignKey('skill.id')),
    db.Column('candidate_id', db.Integer, db.ForeignKey('candidate.id')))

class Candidate(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    skills = db.relationship("Skill", secondary=skill_candidate)

class Skill(db.Model):
    id = db.Column(db.String, primary_key=True)
    name = db.Column(db.String, nullable=False, unique=True)

What am trying to achieve is the following : I want to return all the candidates who possess skills provided in a list input (even ideally, a list of skill_id)

I tried the following :

def get_skilled_candidates(skill_ids):
    return Candidate.query.join(skill_candidate).\
       filter(and_(*[skill_candidate.c.skill_id == skill_id for skill_id in skill_ids])).\
            all()

The aim was to filter all candidates for every skill and compose it with a and_ statement

It works well if I use a list of 1 item (it returns all candidates that possess the skill) but does not if I add more skills in the input list (even tho I have candidates in base that fit the criteria)

like image 879
AugBar Avatar asked Sep 24 '19 12:09

AugBar


People also ask

What are the filter operators in SQL alchemy?

SQLAlchemy ORM - Filter Operators 1 Equals. The usual operator used is == and it applies the criteria to check equality. ... 2 Not Equals. The operator used for not equals is != and it provides not equals criteria. ... 3 Like. ... 4 IN. ... 5 AND. ... 6 OR. ...

What is query from self method in SQLAlchemy?

methodsqlalchemy.orm.Query.from_self(*entities)¶ return a Query that selects from this Query’s SELECT statement. Deprecated since version 1.4: The Query.from_self()method is considered legacy as of the 1.x series of SQLAlchemy and will be removed in 2.0.

What is SQL query in SQL alchemy?

classsqlalchemy.orm. Query(entities, session=None)¶ ORM-level SQL construction object. Queryis the source of all SELECT statements generated by the ORM, both those formulated by end-user query operations as well as by high level internal operations such as related collection loading.

How to filter resultset represented by query object in SQL Server?

Resultset represented by Query object can be subjected to certain criteria by using filter () method. The general usage of filter method is as follows − In the following example, resultset obtained by SELECT query on Customers table is filtered by a condition, (ID>2) − This statement will translate into following SQL expression −


Video Answer


2 Answers

As noted in the comments, what you'd need is a FORALL operation (universal quantifier), or relational division.

FORALL x ( p(x) )

can be expressed as

NOT ( EXISTS x ( NOT ( p(x) ) ) )

which is a bit unwieldy and hard to reason about, if you don't know about FORALL and their relationship. Given your models it could look like:

def get_skilled_candidates(skill_ids):
    # Form a temporary derived table using unions
    skills = db.union_all(*[
        db.select([db.literal(sid).label('skill_id')])
        for sid in skill_ids]).alias()

    return Candidate.query.\
        filter(
            ~db.exists().select_from(skills).where(
                ~db.exists().
                    where(db.and_(skill_candidate.c.skill_id == skills.c.skill_id,
                                  skill_candidate.c.candidate_id == Candidate.id)).
                    correlate_except(skill_candidate))).\
        all()

There are of course other ways to express the same query, such as:

def get_skilled_candidates(skill_ids):
    return Candidate.query.\
        join(skill_candidate).\
        filter(skill_candidate.c.skill_id.in_(skill_ids)).\
        group_by(Candidate.id).\
        having(db.func.count(skill_candidate.c.skill_id.distinct()) ==
               len(set(skill_ids))).\
        all()

which essentially checks by count that all skill ids were matched.

If using Postgresql you could also do:

from sqlalchemy.dialects.postgresql import array_agg

def get_skilled_candidates(skill_ids):
    # The double filtering may seem redundant, but the WHERE ... IN allows
    # the query to use indexes, while the HAVING ... @> does the final filtering.
    return Candidate.query.\
        join(skill_candidate).\
        filter(skill_candidate.c.skill_id.in_(skill_ids)).\
        group_by(Candidate.id).\
        having(array_agg(skill_candidate.c.skill_id).contains(skill_ids)).\
        all()

This is somewhat equivalent with the partly Python solution from the other answer.

Also, the aggregate EVERY could be used:

def get_skilled_candidates(skill_ids):
    # Form a temporary derived table using unions
    skills = db.union_all(*[
        db.select([db.literal(sid).label('skill_id')])
        for sid in skill_ids]).alias()

    # Perform a CROSS JOIN between candidate and skills
    return Candidate.query.\
        join(skills, db.true()).\
        group_by(Candidate.id).\
        having(db.func.every(
            db.exists().
                where(db.and_(skill_candidate.c.skill_id == skills.c.skill_id,
                              skill_candidate.c.candidate_id == Candidate.id)).
                correlate_except(skill_candidate))).\
        all()
like image 58
Ilja Everilä Avatar answered Nov 14 '22 22:11

Ilja Everilä


You could query all candidates with any of the skills in your list and then filter the result with a list comprehension. This may not be as performant as the relational division approach mentioned by @IljaEverilä, but it certainly simplifies the query aspect.

skill_ids = ['id_1', 'id_2']
candidates = session.query(Candidate).\
    filter(Candidate.skills.any(Skill.id.in_(skill_ids)).\
    all()

candidates = [
    c for c in candidates
    if set(s.id for s in c.skills).issuperset(skill_ids)
]
like image 39
benvc Avatar answered Nov 14 '22 23:11

benvc