I have the following model defined with Flask-SQLAlchemy:
"""models.py"""
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy()
skill_candidate = db.Table(
'SkillCandidate',
db.Column('skill_id', db.String, db.ForeignKey('skill.id')),
db.Column('candidate_id', db.Integer, db.ForeignKey('candidate.id')))
class Candidate(db.Model):
id = db.Column(db.Integer, primary_key=True)
skills = db.relationship("Skill", secondary=skill_candidate)
class Skill(db.Model):
id = db.Column(db.String, primary_key=True)
name = db.Column(db.String, nullable=False, unique=True)
What am trying to achieve is the following : I want to return all the candidates who possess skills provided in a list input (even ideally, a list of skill_id)
I tried the following :
def get_skilled_candidates(skill_ids):
return Candidate.query.join(skill_candidate).\
filter(and_(*[skill_candidate.c.skill_id == skill_id for skill_id in skill_ids])).\
all()
The aim was to filter all candidates for every skill and compose it with a and_ statement
It works well if I use a list of 1 item (it returns all candidates that possess the skill) but does not if I add more skills in the input list (even tho I have candidates in base that fit the criteria)
SQLAlchemy ORM - Filter Operators 1 Equals. The usual operator used is == and it applies the criteria to check equality. ... 2 Not Equals. The operator used for not equals is != and it provides not equals criteria. ... 3 Like. ... 4 IN. ... 5 AND. ... 6 OR. ...
methodsqlalchemy.orm.Query.from_self(*entities)¶ return a Query that selects from this Query’s SELECT statement. Deprecated since version 1.4: The Query.from_self()method is considered legacy as of the 1.x series of SQLAlchemy and will be removed in 2.0.
classsqlalchemy.orm. Query(entities, session=None)¶ ORM-level SQL construction object. Queryis the source of all SELECT statements generated by the ORM, both those formulated by end-user query operations as well as by high level internal operations such as related collection loading.
Resultset represented by Query object can be subjected to certain criteria by using filter () method. The general usage of filter method is as follows − In the following example, resultset obtained by SELECT query on Customers table is filtered by a condition, (ID>2) − This statement will translate into following SQL expression −
As noted in the comments, what you'd need is a FORALL
operation (universal quantifier), or relational division.
FORALL x ( p(x) )
can be expressed as
NOT ( EXISTS x ( NOT ( p(x) ) ) )
which is a bit unwieldy and hard to reason about, if you don't know about FORALL
and their relationship. Given your models it could look like:
def get_skilled_candidates(skill_ids):
# Form a temporary derived table using unions
skills = db.union_all(*[
db.select([db.literal(sid).label('skill_id')])
for sid in skill_ids]).alias()
return Candidate.query.\
filter(
~db.exists().select_from(skills).where(
~db.exists().
where(db.and_(skill_candidate.c.skill_id == skills.c.skill_id,
skill_candidate.c.candidate_id == Candidate.id)).
correlate_except(skill_candidate))).\
all()
There are of course other ways to express the same query, such as:
def get_skilled_candidates(skill_ids):
return Candidate.query.\
join(skill_candidate).\
filter(skill_candidate.c.skill_id.in_(skill_ids)).\
group_by(Candidate.id).\
having(db.func.count(skill_candidate.c.skill_id.distinct()) ==
len(set(skill_ids))).\
all()
which essentially checks by count that all skill ids were matched.
If using Postgresql you could also do:
from sqlalchemy.dialects.postgresql import array_agg
def get_skilled_candidates(skill_ids):
# The double filtering may seem redundant, but the WHERE ... IN allows
# the query to use indexes, while the HAVING ... @> does the final filtering.
return Candidate.query.\
join(skill_candidate).\
filter(skill_candidate.c.skill_id.in_(skill_ids)).\
group_by(Candidate.id).\
having(array_agg(skill_candidate.c.skill_id).contains(skill_ids)).\
all()
This is somewhat equivalent with the partly Python solution from the other answer.
Also, the aggregate EVERY
could be used:
def get_skilled_candidates(skill_ids):
# Form a temporary derived table using unions
skills = db.union_all(*[
db.select([db.literal(sid).label('skill_id')])
for sid in skill_ids]).alias()
# Perform a CROSS JOIN between candidate and skills
return Candidate.query.\
join(skills, db.true()).\
group_by(Candidate.id).\
having(db.func.every(
db.exists().
where(db.and_(skill_candidate.c.skill_id == skills.c.skill_id,
skill_candidate.c.candidate_id == Candidate.id)).
correlate_except(skill_candidate))).\
all()
You could query all candidates with any of the skills in your list and then filter the result with a list comprehension. This may not be as performant as the relational division approach mentioned by @IljaEverilä, but it certainly simplifies the query aspect.
skill_ids = ['id_1', 'id_2']
candidates = session.query(Candidate).\
filter(Candidate.skills.any(Skill.id.in_(skill_ids)).\
all()
candidates = [
c for c in candidates
if set(s.id for s in c.skills).issuperset(skill_ids)
]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With