SQLAlchemy filter on list attribute

Tags:

I have the following model defined with Flask-SQLAlchemy:

"""models.py"""

from flask_sqlalchemy import SQLAlchemy

db = SQLAlchemy()

skill_candidate = db.Table(
    'SkillCandidate',
    db.Column('skill_id', db.String, db.ForeignKey('skill.id')),
    db.Column('candidate_id', db.Integer, db.ForeignKey('candidate.id')))

class Candidate(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    skills = db.relationship("Skill", secondary=skill_candidate)

class Skill(db.Model):
    id = db.Column(db.String, primary_key=True)
    name = db.Column(db.String, nullable=False, unique=True)

What am trying to achieve is the following : I want to return all the candidates who possess skills provided in a list input (even ideally, a list of skill_id)

I tried the following :

def get_skilled_candidates(skill_ids):
    return Candidate.query.join(skill_candidate).\
       filter(and_(*[skill_candidate.c.skill_id == skill_id for skill_id in skill_ids])).\
            all()

The aim was to filter all candidates for every skill and compose it with a and_ statement

It works well if I use a list of 1 item (it returns all candidates that possess the skill) but does not if I add more skills in the input list (even tho I have candidates in base that fit the criteria)

879

asked Sep 24 '19 12:09

AugBar

Video Answer

2 Answers

As noted in the comments, what you'd need is a FORALL operation (universal quantifier), or relational division.

FORALL x ( p(x) )

can be expressed as

NOT ( EXISTS x ( NOT ( p(x) ) ) )

which is a bit unwieldy and hard to reason about, if you don't know about FORALL and their relationship. Given your models it could look like:

def get_skilled_candidates(skill_ids):
    # Form a temporary derived table using unions
    skills = db.union_all(*[
        db.select([db.literal(sid).label('skill_id')])
        for sid in skill_ids]).alias()

    return Candidate.query.\
        filter(
            ~db.exists().select_from(skills).where(
                ~db.exists().
                    where(db.and_(skill_candidate.c.skill_id == skills.c.skill_id,
                                  skill_candidate.c.candidate_id == Candidate.id)).
                    correlate_except(skill_candidate))).\
        all()

There are of course other ways to express the same query, such as:

def get_skilled_candidates(skill_ids):
    return Candidate.query.\
        join(skill_candidate).\
        filter(skill_candidate.c.skill_id.in_(skill_ids)).\
        group_by(Candidate.id).\
        having(db.func.count(skill_candidate.c.skill_id.distinct()) ==
               len(set(skill_ids))).\
        all()

which essentially checks by count that all skill ids were matched.

If using Postgresql you could also do:

from sqlalchemy.dialects.postgresql import array_agg

def get_skilled_candidates(skill_ids):
    # The double filtering may seem redundant, but the WHERE ... IN allows
    # the query to use indexes, while the HAVING ... @> does the final filtering.
    return Candidate.query.\
        join(skill_candidate).\
        filter(skill_candidate.c.skill_id.in_(skill_ids)).\
        group_by(Candidate.id).\
        having(array_agg(skill_candidate.c.skill_id).contains(skill_ids)).\
        all()

This is somewhat equivalent with the partly Python solution from the other answer.

Also, the aggregate EVERY could be used:

def get_skilled_candidates(skill_ids):
    # Form a temporary derived table using unions
    skills = db.union_all(*[
        db.select([db.literal(sid).label('skill_id')])
        for sid in skill_ids]).alias()

    # Perform a CROSS JOIN between candidate and skills
    return Candidate.query.\
        join(skills, db.true()).\
        group_by(Candidate.id).\
        having(db.func.every(
            db.exists().
                where(db.and_(skill_candidate.c.skill_id == skills.c.skill_id,
                              skill_candidate.c.candidate_id == Candidate.id)).
                correlate_except(skill_candidate))).\
        all()

answered Nov 14 '22 22:11

Ilja Everilä

You could query all candidates with any of the skills in your list and then filter the result with a list comprehension. This may not be as performant as the relational division approach mentioned by @IljaEverilä, but it certainly simplifies the query aspect.

skill_ids = ['id_1', 'id_2']
candidates = session.query(Candidate).\
    filter(Candidate.skills.any(Skill.id.in_(skill_ids)).\
    all()

candidates = [
    c for c in candidates
    if set(s.id for s in c.skills).issuperset(skill_ids)
]

answered Nov 14 '22 23:11

benvc

Related questions
                            
                                Python - Spread **kwargs into a dictionary similar to ES6 spread
                            
                                Airflow DAG Scheduled date is a week behind
                            
                                How to define specific number of convolutional kernels/filters in pytorch?
                            
                                How to load images larger than MAX_IMAGE_PIXELS with PIL?
                            
                                Unable to change axis titles in plotly surface plot in python
                            
                                Pyspark - Cumulative sum with reset condition
                            
                                Swap column values based on a condition in pandas
                            
                                How do I combine large csv files in python?
                            
                                Why can't I import functions in bert after pip install bert
                            
                                OAuth throws "missing code validator" in Google OAuth2
                            
                                Save Pandas data frame to Google Cloud bucket
                            
                                How to properly scale frequency axis in Fast Fourier Transform?
                            
                                How can I parse a Wikipedia XML dump with Python?
                            
                                Target array shape different to expected output using Tensorflow
                            
                                Skip Flask logging for one endpoint?
                            
                                python: pycodestyle (ex pep8) vs pylint strictness
                            
                                Why would the loss decrease while the accuracy stays the same?
                            
                                Docker-compose logs are only showing "Attaching to" and nothing else
                            
                                Drop all rows in Pandas DataFrame where value is NOT NaN
                            
                                Django Background tasks vs Celery

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SQLAlchemy filter on list attribute

Tags:

python

sqlalchemy

flask-sqlalchemy