My department was recently reprimanded (nicely) by our IT department for running queries with very high costs on the premise that our queries have a real possibility of destabilizing and/or crashing the database. None of us are DBA's; were just researchers who write and execute queries against the database, and I'm probably the only one who ever looked at an explain plan before the reprimand.
We were told that query costs over 100 should be very rare, and queries with costs over 1000 should never be run. The problems I am running into are that cost seems have no correlation with execution time, and I'm losing productivity while trying to optimize my queries.
As an example, I have a query that executes in under 5 seconds with a cost of 10844. I rewrote the query to use a view that contains most of the information I need, and got the cost down to 109, but the new query, which retrieves the same results, takes 40 seconds to run. I found a question here with a possible explanation:
Measuring Query Performance : "Execution Plan Query Cost" vs "Time Taken"
That question led me to parallelism hints. I tried using /*+ no_parallel*/
in the cost 10884 query, but the cost did not change, nor did the execution time, so I'm not sure that parallelism is the explanation for the faster execution time but higher cost. Then, I tried using the /*+ parallel(n)*/
hint, and found that the higher the value of n
, the lower the cost of the query. In the case of cost 10844 query, I found that /*+ parallel(140)*/
dropped the cost to 97, with a very minor increase in execution time.
This seemed like an ideal "cheat" to meet the requirements that our IT department set forth, but then I read this:
http://www.oracle.com/technetwork/articles/datawarehouse/twp-parallel-execution-fundamentals-133639.pdf
The article contains this sentence:
Parallel execution can enable a single operation to utilize all system resources.
So, my questions are:
Am I actually placing more strain on the server resources by using the /*+ parallel(n)*/
hint with a very high degree of parallelism, even though I am lowering the cost?
Assuming no parallelism, is execution speed a better measure of resources used than cost?
Using SQL Server Management StudioIn Object Explorer, right-click a server and select Properties. Click the Advanced node. Under Parallelism, change the Cost Threshold for Parallelism option to the value you want. Type or select a value from 0 to 32767.
What you are experiencing is called caching. The database doesn't have to go to disk the 2nd time because it can either get the data from its own buffer cache, or the operating system/disk array can also provide the data faster from its own cache.
Parallel query is a method used to increase the execution speed of SQL queries by creating multiple query processes that divide the workload of a SQL statement and executing it in parallel or at the same time.
The rule your DBA gave you doesn't make a lot of sense. Worrying about the cost that is reported for a query is very seldom productive. First, you cannot directly compare the cost of two different queries-- one query that has a cost in the millions may run very quickly and consume very few system resources another query that has a cost in the hundreds may run for hours and bring the server to its knees. Second, cost is an estimate. If the optimizer made an accurate estimate of the cost, that strongly implies that it has come up with the optimal query plan which would mean that it is unlikely that you'd be able to modify the query to return the same results while using fewer resources. If the optimizer made an inaccurate estimate of the cost, that strongly implies that it has come up with a poor query plan in which case the reported cost would have no relationship to any useful metric you'd come up with. Most of the time, the queries you're trying to optimize are the queries where the optimizer generated an incorrect query plan because it incorrectly estimated the cost of various steps.
Tricking the optimizer by using hints that may or may not actually change the query plan (depending on how parallelism is configured, for example) is very unlikely to solve a problem-- it's much more likely to cause the optimizer's estimates to be less accurate and make it more likely that it chooses a query plan that consumes far more resources than it needs to. A parallel
hint with a high degree of parallelism, for example, would tell Oracle to drastically reduce the cost of a full table scan which makes it more likely that the optimizer would choose that over an index scan. That is seldom something that your DBAs would want to see.
If you're looking for a single metric that tells you whether a query plan is reasonable, I'd use the amount of logical I/O. Logical I/O is correlated pretty well with actual query performance and with the amount of resources your query consumes. Looking at execution time can be problematic because it varies significantly based on what data happens to be cached (which is why queries often run much faster the second time they're executed) while logical I/O doesn't change based on what data is in cache. It also lets you scale your expectations as the number of rows your queries need to process change. If you're writing a query that needs to aggregate data from 1 million rows, for example, that should consume far more resources than a query that needs to return 100 rows of data from a table with no aggregation. If you're looking at logical I/O, you can easily scale your expectations to the size of the problem to figure out how efficient your queries could realistically be.
In Christian Antognini's "Troubleshooting Oracle Performance" (page 450), for example, he gives a rule of thumb that is pretty reasonable
Different systems with different data models may merit tweaking the buckets a bit but those are likely to be good starting points.
My guess is that if you're researchers that are not developers, you're probably running queries that need to aggregate or fetch relatively large data sets, at least in comparison to those that application developers are commonly writing. If you're scanning a million rows of data to generate some aggregate results, your queries are naturally going to consume far more resources than an application developer whose queries are reading or writing a handful of rows. You may be writing queries that are just as efficient from a logical I/O per row perspective, you just may be looking at many more rows.
If you are running queries against the live production database, you may well be in a situation where it makes sense to start segregating workload. Most organizations reach a point where running reporting queries against the live database starts to create issues for the production system. One common solution to this sort of problem is to create a separate reporting database that is fed from the production system (either via a nightly snapshot or by an ongoing replication process) where reporting queries can run without disturbing the production application. Another common solution is to use something like Oracle Resource Manager to limit the amount of resources available to one group of users (in this case, report developers) in order to minimize the impact on higher priority users (in this case, users of the production system).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With