Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is SQLAlchemy count() much slower than the raw query?

I'm using SQLAlchemy with a MySQL database and I'd like to count the rows in a table (roughly 300k). The SQLAlchemy count function takes about 50 times as long to run as writing the same query directly in MySQL. Am I doing something wrong?

# this takes over 3 seconds to return session.query(Segment).count() 

However:

SELECT COUNT(*) FROM segments; +----------+ | COUNT(*) | +----------+ |   281992 | +----------+ 1 row in set (0.07 sec) 

The difference in speed increases with the size of the table (it is barely noticeable under 100k rows).

Update

Using session.query(Segment.id).count() instead of session.query(Segment).count() seems to do the trick and get it up to speed. I'm still puzzled why the initial query is slower though.

like image 220
mtth Avatar asked Feb 07 '13 15:02

mtth


People also ask

Why is SQLAlchemy so slow?

At the ORM level, the speed issues are because creating objects in Python is slow, and the SQLAlchemy ORM applies a large amount of bookkeeping to these objects as it fetches them, which is necessary in order for it to fulfill its usage contract, including unit of work, identity map, eager loading, collections, etc.

How do you count in SQLAlchemy?

Implementing GroupBy and count in SQLAlchemyPass the SQL query to the execute() function and get all the results using fetchall() function. Use a for loop to iterate through the results.

Should I use SQLAlchemy core or ORM?

If you want to view your data in a more schema-centric view (as used in SQL), use Core. If you have data for which business objects are not needed, use Core. If you view your data as business objects, use ORM. If you are building a quick prototype, use ORM.


1 Answers

Unfortunately MySQL has terrible, terrible support of subqueries and this is affecting us in a very negative way. The SQLAlchemy docs point out that the "optimized" query can be achieved using query(func.count(Segment.id)):

Return a count of rows this Query would return.

This generates the SQL for this Query as follows:

SELECT count(1) AS count_1 FROM (      SELECT <rest of query follows...> ) AS anon_1 

For fine grained control over specific columns to count, to skip the usage of a subquery or otherwise control of the FROM clause, or to use other aggregate functions, use func expressions in conjunction with query(), i.e.:

from sqlalchemy import func  # count User records, without # using a subquery. session.query(func.count(User.id))  # return count of user "id" grouped # by "name" session.query(func.count(User.id)).\         group_by(User.name)  from sqlalchemy import distinct  # count distinct "name" values session.query(func.count(distinct(User.name))) 
like image 190
zzzeek Avatar answered Oct 18 '22 13:10

zzzeek