I am having hard time optimizing my SQLAlchemy queries. My SQL knowledge is very basic, and I just can't get the stuff I need from the SQLAlchemy docs.
Suppose the following very basic one-to-many relationship:
class Parent(Base):
__tablename__ = "parents"
id = Column(Integer, primary_key = True)
children = relationship("Child", backref = "parent")
class Child(Base):
__tablename__ = "children"
id = Column(Integer, primary_key = True)
parent_id = Column(Integer, ForeignKey("parents.id"))
naughty = Column(Boolean)
How could I:
(Parent, count_of_naughty_children, count_of_all_children)
for each parent?After decent time spent googling, I found how to query those values separately:
# The following returns tuples of (Parent, count_of_all_children):
session.query(Parent, func.count(Child.id)).outerjoin(Child, Parent.children).\
group_by(Parent.id)
# The following returns tuples of (Parent, count_of_naughty_children):
al = aliased(Children, session.query(Children).filter_by(naughty = True).\
subquery())
session.query(Parent, func.count(al.id)).outerjoin(al, Parent.children).\
group_by(Parent.id)
I tried to combine them in different ways, but didn't manage to get what I want.
I guess this query is going to be based on the previous one, filtering by naughty/all ratio.
Any help is appreciated.
EDIT : Thanks to Antti Haapala's help, I found solution to the second question:
avg = func.avg(func.coalesce(Child.naughty, 0)) # coalesce() treats NULLs as 0
# avg = func.avg(Child.naughty) - if you want to ignore NULLs
session.query(Parent).join(Child, Parent.children).group_by(Parent).\
having(avg > 0.8)
It finds average if children's naughty
variable, treating False and NULLs as 0, and True as 1. Tested with MySQL backend, but should work on others, too.
You can count multiple COUNT() for multiple conditions in a single query using GROUP BY. SELECT yourColumnName,COUNT(*) from yourTableName group by yourColumnName; To understand the above syntax, let us first create a table.
Instead of returning entire model instances, SQLAlchemy can fetch only the columns we're interested in. This not only reduces the amount of data sent, but also avoids the need to instantiate entire objects. Working with tuples of column data instead of models can be quite a bit faster.
Implementing alias in SQLAlchemy SQL alias is a method of giving a temporary name for a table that is more convenient and readable. SQL alias facilitates a simple name to be used in place of a complex table name when it has to be used multiple times in a query.
the count()
sql aggretate function is pretty simple; it gives you the total number of non-null values in each group. With that in mind, we can adjust your query to give you the proper result.
print (Query([
Parent,
func.count(Child.id),
func.count(case(
[((Child.naughty == True), Child.id)], else_=literal_column("NULL"))).label("naughty")])
.join(Parent.children).group_by(Parent)
)
Which produces the following sql:
SELECT
parents.id AS parents_id,
count(children.id) AS count_1,
count(CASE WHEN (children.naughty = 1)
THEN children.id
ELSE NULL END) AS naughty
FROM parents
JOIN children ON parents.id = children.parent_id
GROUP BY parents.id
If your query is only to get the parents who have > 80 % children naughty, you can on most databases cast the naughty
to integer, then take average of it; then having
this average greater than 0.8
.
Thus you get something like
from sqlalchemy.sql.expression import cast
naughtyp = func.avg(cast(Child.naughty, Integer))
session.query(Parent, func.count(Child.id), naughtyp).join(Child)\
.group_by(Parent.id).having(naughtyp > 0.8).all()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With