Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLAlchemy: several counts in one query

I am having hard time optimizing my SQLAlchemy queries. My SQL knowledge is very basic, and I just can't get the stuff I need from the SQLAlchemy docs.

Suppose the following very basic one-to-many relationship:

class Parent(Base):
    __tablename__ = "parents"
    id = Column(Integer, primary_key = True)
    children = relationship("Child", backref = "parent")

class Child(Base):
    __tablename__ = "children"
    id = Column(Integer, primary_key = True)
    parent_id = Column(Integer, ForeignKey("parents.id"))
    naughty = Column(Boolean)

How could I:

  • Query tuples of (Parent, count_of_naughty_children, count_of_all_children) for each parent?

After decent time spent googling, I found how to query those values separately:

# The following returns tuples of (Parent, count_of_all_children):
session.query(Parent, func.count(Child.id)).outerjoin(Child, Parent.children).\
    group_by(Parent.id)
# The following returns tuples of (Parent, count_of_naughty_children):
al = aliased(Children, session.query(Children).filter_by(naughty = True).\
    subquery())
session.query(Parent, func.count(al.id)).outerjoin(al, Parent.children).\
    group_by(Parent.id)

I tried to combine them in different ways, but didn't manage to get what I want.

  • Query all parents which have more than 80% naughty children? Edit: naughty could be NULL.

I guess this query is going to be based on the previous one, filtering by naughty/all ratio.

Any help is appreciated.

EDIT : Thanks to Antti Haapala's help, I found solution to the second question:

avg = func.avg(func.coalesce(Child.naughty, 0)) # coalesce() treats NULLs as 0
# avg = func.avg(Child.naughty) - if you want to ignore NULLs
session.query(Parent).join(Child, Parent.children).group_by(Parent).\
    having(avg > 0.8)

It finds average if children's naughty variable, treating False and NULLs as 0, and True as 1. Tested with MySQL backend, but should work on others, too.

like image 269
kreo Avatar asked Jul 23 '14 16:07

kreo


People also ask

Can we use 2 count in SQL?

You can count multiple COUNT() for multiple conditions in a single query using GROUP BY. SELECT yourColumnName,COUNT(*) from yourTableName group by yourColumnName; To understand the above syntax, let us first create a table.

How make SQLAlchemy faster?

Instead of returning entire model instances, SQLAlchemy can fetch only the columns we're interested in. This not only reduces the amount of data sent, but also avoids the need to instantiate entire objects. Working with tuples of column data instead of models can be quite a bit faster.

What is aliased in SQLAlchemy?

Implementing alias in SQLAlchemy SQL alias is a method of giving a temporary name for a table that is more convenient and readable. SQL alias facilitates a simple name to be used in place of a complex table name when it has to be used multiple times in a query.


2 Answers

the count() sql aggretate function is pretty simple; it gives you the total number of non-null values in each group. With that in mind, we can adjust your query to give you the proper result.

print (Query([
    Parent,
    func.count(Child.id),
    func.count(case(
        [((Child.naughty == True), Child.id)], else_=literal_column("NULL"))).label("naughty")])

    .join(Parent.children).group_by(Parent)
    )

Which produces the following sql:

SELECT 
 parents.id AS parents_id, 
 count(children.id) AS count_1, 
 count(CASE WHEN (children.naughty = 1) 
       THEN children.id 
       ELSE NULL END) AS naughty 
FROM parents 
JOIN children ON parents.id = children.parent_id 
GROUP BY parents.id
like image 91
SingleNegationElimination Avatar answered Nov 05 '22 08:11

SingleNegationElimination


If your query is only to get the parents who have > 80 % children naughty, you can on most databases cast the naughty to integer, then take average of it; then having this average greater than 0.8.

Thus you get something like

from sqlalchemy.sql.expression import cast

naughtyp = func.avg(cast(Child.naughty, Integer))
session.query(Parent, func.count(Child.id), naughtyp).join(Child)\
    .group_by(Parent.id).having(naughtyp > 0.8).all()