There are times I can use force index option to use particular index on a query to make the query faster.
Then after some times the data in that table may change. And the force index which I used may not be the right index search for that query.
My questions are
Is there anyway to suggest the Query Optimiser to use a force index as a possible option during its plan generation. If it is a slow plan on using the forced index then it can use normal query planning.
Or Else Is there anyway to edit the Code of MySql/PSQL to suggest the Optimiser to use a force index as a possible option during its plan generation.
Additional Info: I wanted to add my plan to the optimiser plan list( that Optimiser already created many plan for a query). So that while choosing the best plan for a query I wanted the optimiser to consider my plan also. If this way is possible then the optimiser need not consider force index every time. It can keep the force index as a possible option to the plan generation
I tried to make the question clear as of my knowledge. If anyone couldn't understand comment your queries.
The Query Optimizer Those executable commands are known as Query Plans. The Query Optimizer generates multiple Query Plans for a single query and determines the most efficient plan to run. There are often many different ways to search a database.
The goal of query optimization is improving query processing, which means that the query optimizer needs to take into account the runtime effect of different alternatives. This is done by estimating the costs of executing an alternative.
The only way is to remove FORCE INDEX
from your query.
You can override the optimizer's plan, or else you let the optimizer choose the plan.
The optimizer can't tell if using the index will be slow. It tries to make its best guess at choosing a plan that will be fast.
The only way the optimizer could be more accurate is to run both queries, and compare the performance difference, and then run the fast query. That's obviously not practical, so the optimizer is meant to estimate the cost of the query, based on some statistical model. Most of the time this is accurate enough that the optimizer's choice is best.
The general concept with which you are dealing is "internal database statistics" (not an official term). That is, the information the database engine uses to make the query plan. The statistics store details like the key distribution within the indexes, table row sizes, counts, percentage of null rows, distinctness of data, and so on.
For historical (and relevant) performance reasons, the internal database statistics are not updated on-the-fly, but are typically updated periodically at the the DBA's behest. In Postgres land, this information is succinctly available via the pg_statistic
table. Unfortunately, MySQL does not make this information readily available, but you can see what you can glean from the INFORMATION_SCHEMA and SHOW INDEXES.
To your problem at hand -- the fleeting usefulness of query plan hints -- is that the DB's statistics are not representative of the table data. The general response then is: update the DB's statistics of the tables in question, and you won't need to provide optimizer hints in your queries:
ANALYZE TABLE <table_name>;
There are some caveats of which to be aware:
Without extra measures, this analysis by the DB engine is ephemeral: it will not persist across a database restart/reboot. The options are to reanalyze tables upon DB start, or to look in to statistics persistence.
This generates statistics by sampling the data -- looking at a subset of random table rows. By nature, a random sampling means that the statistics gathered might, on any particular run, not be a good representation of the actual data, leading to poor query plans. You can increase the number of sample pages via innodb_stats_persistent_sample_pages
ANALYZE
is not OPTIMIZE
, and your tables might also need optimization. OPTIMIZE
is an expensive operation, requiring a lock on the table, but can be hugely beneficial under certain conditions.
ANALYZE
is table specific, and cannot make up for poor table layout. For example, I was recently tasked with speeding up a slow running set of report queries. The culprit? The schema was storing time and date columns as strings, rather than as native data types. This had huge implications as the queries first had to cast the strings to dates (including with errors in the data) before comparison, resulting in multiple entire table scans. Very bad.
ANALYZE
cannot make up for poor queries. For example, as with the previous bullet point, something like WHERE SOME_FUNCTION( col ) > some_value
means that the query planner likely cannot utilize the indexes as each row must first execute the FUNCTION to get the condition result. Not always the case, but often found in naive SQL queries.
ANALYZE
cannot make up for poor normalization. Not withstanding the potentially large algorithmic constants, you will get far bigger big-O algorithmic wins by having proper normalization.
ANALYZE
does not create indexes. If a query accesses columns that have high-cardinality but no indexes, too bad. Knowing on which columns to put indexes per the known or expected queries is up to the DBA.
And as usual in cases like these, take all of StackOverflow advices with grains of salt: you are in charge of the data or programs and have the actual running product in front of you. You can test, you can measure, you can poke around. We can't.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With