Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do SQL bind parameters affect performance?

Suppose I have a table called Projects with a column called Budget with a standard B-Tree index. The table has 50,000 projects, and only 1% of them have a Budget of over one million. If I ran the SQL Query:

SELECT * From Projects WHERE Budget > 1000000;

The planner will use an index range scan on Budget to get the rows off the heap table. However, if I use the query:

SELECT * From Projects WHERE Budget > 50;

The planner will most likely do a sequential scan on the table, as it will know this query will end up returning most or all rows anyway and there's no reason to load all the pages of the index into memory.

Now, let's say I run the query:

SELECT * From Projects WHERE Budget > :budget;

Where :budget is a bind parameter passed into my database. From what I've read, the query as above will be cached, and no data on cardinality can be inferred. In fact, most databases will just assume an even distribution and the cached query plan will reflect that. This surprised me, as usually when you read about the benefits of bind parameters it's on the subject of preventing SQL injection attacks.

Obviously, this could improve performance if the resulting query plan would be the same, as a new plan wouldn't have to be compiled, but could also hurt performance if the values of :budget greatly varied.

My Question: Why are bind parameters not resolved before the query plan is generated and cached? Shouldn't modern databases strive to generate the best plan for the query, which should mean looking at the value for each parameter and getting accurate index stats?

Note: This question probably doesn't apply to mySql as mySql doesn't cache SQL plans. However, I'm interested in why this is the case on Postgres, Oracle and MS SQL.

like image 381
Mike Christensen Avatar asked Sep 05 '12 16:09

Mike Christensen


2 Answers

For Oracle specifically, it depends.

For quite some time (at least 9i), Oracle has supported bind variable peeking. That means that the first time a query is executed, the optimizer peeks at the value of the bind variable and bases its cardinality estimates on the value of that first bind variable. That makes sense in cases where most of the executions of a query are going to have bind variable values that return similarly sized results. If 99% of the queries are using small budget values, it is highly likely that the first execution will use a small value and thus the cached query plan will be appropriate for small bind variable values. Of course, that means that when you do specify a large bind variable value (or, worse, if you get lucky and the first execution is with a large value) you'll get less than optimal query plans.

If you are using 11g, Oracle can use adaptive cursor sharing. This allows the optimizer to maintain multiple query plans for a single query and to pick the appropriate plan based on the bind variable values. That can get rather complicated over time, though. If you have a query with N bind variables, the optimizer has to figure out how to partition that N-dimensional space into different query plans for different bind variable values in order to figure out when and whether to re-optimize a query for a new set of bind variable values and when to simply reuse an earlier plan. A lot of that work ends up being done at night during the nightly maintenance window in order to avoid incurring those costs during the productive day. But that also brings up issues about how much freedom the DBA wants to give the database to evolve plans over time vs how much the DBA wants to control plans so that the database doesn't suddenly start picking a poor plan that causes some major system to slow to a crawl on a random day.

like image 105
Justin Cave Avatar answered Nov 07 '22 05:11

Justin Cave


This surprised me, as usually when you read about the benefits of bind parameters it's on the subject of preventing SQL injection attacks.

Don't confuse parameterized queries with prepared statements. Both offer parameterization, but prepared statements offer the additional caching of the query plan.

Why are bind parameters not resolved before the query plan is generated and cached?

Because sometimes generating the query plan is an expensive step. Prepared statements allow you to amortize the cost of query planning.

However, if all you're looking for is SQL injection protection, don't use prepared statements. Use parameterized queries.

For example, in PHP, you can use http://php.net/pg_query_params to execute a parameterized query WITHOUT caching the query plan; meanwhile http://php.net/pg_prepare and http://php.net/pg_execute are used to cache a plan for a prepared statement and later execute it.

Edit: 9.2 apparently changes the way prepared statements are planned

like image 21
Frank Farmer Avatar answered Nov 07 '22 05:11

Frank Farmer