Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Profiling SQL query

I'm using sqlalchemy (expression language, not full ORM) with MySQL and experiencing some unexpected slowness. Particularly, the time spent performing a select query in by sqlalchemy is ten times greater than the time spent performing the same query from the mysql command line.

Output from cprofile:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
100  206.703    2.067  206.703    2.067 {method 'query' of '_mysql.connection' objects}

MySQL time: 0.26 seconds

The consensus seems to be that there is some overhead using sqlalchemy, but not nearly this much. Any suggestions as to what could cause behavior like this?

The queries are generally of the form:

SELECT fieldnames.minage, fieldnames.maxage, fieldnames.race,    
fieldnames.sex, sum( pop.population ) AS pop, pop.zip5
FROM pop
INNER JOIN fieldnames ON fieldnames.fieldname = pop.fieldname_id
WHERE fieldnames.race IN ("White alone")
AND fieldnames.sex IN ("Female")
AND fieldnames.maxage >=101
AND fieldnames.minage <=107
GROUP BY fieldnames.minage, fieldnames.maxage
like image 683
AAmeliorant Avatar asked Nov 13 '22 23:11

AAmeliorant


1 Answers

One possible reason for slowness - Does sql alchemy use prepared statements? If yes, then a reason why you may be experiencing a difference in performance is because the mysql optimizer has different information when creating the two query plans.

When you run the query from the command-line, the mysql optimizer has the complete query with all where clause values filled in (as you showed above3), thus can optimize explicitly for these values.

When you run from sql alchemy, the mysql optimizer may only see this (perhaps fieldnames.race and fieldnames.sex are parameterized as well):

SELECT fieldnames.minage, fieldnames.maxage, fieldnames.race,    
fieldnames.sex, sum( pop.population ) AS pop, pop.zip5
FROM pop
INNER JOIN fieldnames ON fieldnames.fieldname = pop.fieldname_id
WHERE fieldnames.race IN ("White alone")
AND fieldnames.sex IN ("Female")
AND fieldnames.maxage >= ?
AND fieldnames.minage <= ?
GROUP BY fieldnames.minage, fieldnames.maxage

Thus the optimizer has to make a guess on what values you might use then optimize around that. Unfortunately, it may make a bad guess, and thus in a worst case create a query plan that makes the query run significantly slower than you expect.

like image 55
Gareth Avatar answered Nov 17 '22 05:11

Gareth