I'm using SQLAlchemy with a MySQL database and I'd like to count the rows in a table (roughly 300k). The SQLAlchemy count function takes about 50 times as long to run as writing the same query directly in MySQL. Am I doing something wrong?
# this takes over 3 seconds to return session.query(Segment).count()
However:
SELECT COUNT(*) FROM segments; +----------+ | COUNT(*) | +----------+ | 281992 | +----------+ 1 row in set (0.07 sec)
The difference in speed increases with the size of the table (it is barely noticeable under 100k rows).
Update
Using session.query(Segment.id).count()
instead of session.query(Segment).count()
seems to do the trick and get it up to speed. I'm still puzzled why the initial query is slower though.
At the ORM level, the speed issues are because creating objects in Python is slow, and the SQLAlchemy ORM applies a large amount of bookkeeping to these objects as it fetches them, which is necessary in order for it to fulfill its usage contract, including unit of work, identity map, eager loading, collections, etc.
Implementing GroupBy and count in SQLAlchemyPass the SQL query to the execute() function and get all the results using fetchall() function. Use a for loop to iterate through the results.
If you want to view your data in a more schema-centric view (as used in SQL), use Core. If you have data for which business objects are not needed, use Core. If you view your data as business objects, use ORM. If you are building a quick prototype, use ORM.
Unfortunately MySQL has terrible, terrible support of subqueries and this is affecting us in a very negative way. The SQLAlchemy docs point out that the "optimized" query can be achieved using query(func.count(Segment.id))
:
Return a count of rows this Query would return.
This generates the SQL for this Query as follows:
SELECT count(1) AS count_1 FROM ( SELECT <rest of query follows...> ) AS anon_1
For fine grained control over specific columns to count, to skip the usage of a subquery or otherwise control of the FROM clause, or to use other aggregate functions, use func expressions in conjunction with query(), i.e.:
from sqlalchemy import func # count User records, without # using a subquery. session.query(func.count(User.id)) # return count of user "id" grouped # by "name" session.query(func.count(User.id)).\ group_by(User.name) from sqlalchemy import distinct # count distinct "name" values session.query(func.count(distinct(User.name)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With