Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLAlchemy: How to filter after aggregation

How do I filter out all groups with aggregate share totals of 0?

q = session.query(Trades.ticker, func.sum(Trades.shares))
g = q.group_by(Trades.ticker)
f = g.filter(func.sum(Trades.shares) != 0)
result = f.all()

This throws the error on line 3:

... sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) misuse of aggregate: sum() ...

A workaround is to remove the filter and use a list comprehension:

result = [x for x in g.all() if x[1] != 0.0]

But this is very slow compared to filtering a SQL query, because it loads all the data into memory as opposed to only loading the filtered data.

(EDITED to simplify.)

like image 362
ChaimG Avatar asked Nov 21 '17 17:11

ChaimG


People also ask

What command do you use to filter data after aggregating?

To filter records using the aggregate function (the function SUM from earlier), use the HAVING clause. To calculate the sum of values for each group of rows, use the aggregation SUM function.

What is difference between filter and filter by in SQLAlchemy?

The second one, filter_by(), may be used only for filtering by something specifically stated - a string or some number value. So it's usable only for category filtering, not for expression filtering. On the other hand filter() allows using comparison expressions (==, <, >, etc.)

What does First () do in SQLAlchemy?

Return the first result of this Query or None if the result doesn't contain any row. first() applies a limit of one within the generated SQL, so that only one primary entity row is generated on the server side (note this may consist of multiple result rows if join-loaded collections are present).

What is lazy loading SQLAlchemy?

Lazy loading refers to objects are returned from a query without the related objects loaded at first. When the given collection or reference is first accessed on a particular object, an additional SELECT statement is emitted such that the requested collection is loaded.


1 Answers

Okay, the answer is to use having(). See related.

q = session.query(Trades.ticker, func.sum(Trades.shares))
g = q.group_by(Trades.ticker)
f = g.having(func.sum(Trades.shares) != 0)
result = f.all()

Nice, simple, and fast.

Writing out the question helped me think it through clearer and helped me find the answer in the documentation. I'm leaving this here in case it helps someone.

(EDITED to reflect revised question.)

like image 98
ChaimG Avatar answered Oct 22 '22 10:10

ChaimG