I'm seeking guidance on how to think through the minimum number of indices you need for a table in which you a performing different combinations of queries on the same set of columns. Ideally, your answer would abstract some rules of thumb from this specific example (if that's possible). This bulleted list represents three different query conditions commonly performed on my table: <ul> <li>WHERE race_type = ? AND recordable_type = ? AND active = ?</li> <li>WHERE race_type = ? AND recordable_id = ? AND recordable_type = ? AND active = ?</li> <li>WHERE user_id = ? AND race_type = ? AND recordable_id = ? AND recordable_type = ? AND active = ?</li> </ul> Note: user_id (int), race_type (varchar), recordable_id (int), recordable_type (varchar), active (boolean) I could create individual multi-column indices for each of these, but you DB performance experts out there might approach it a different way. If I need to provide more info in order to get the best answer, please, let me know.

If your conditions are hierarchical (like in your example) you can use a combined index. DBMS's have trouble working with multiple indexes at the same time. Although it is possible and they try to make the best out of such situation. This does not change the fact that you should try to have a specific index for a certain where clause. If more WHEREs' indexes can be combined to a single one, then you free up some space and CPU cycles. Let's start out by specifying an index for every WHERE: <pre class="prettyprint"><code>index1 (race_type, recordable_type, active) index2 (race_type, recordable_id, recordable_type, active) index3 (user_id, race_type, recordable_id, recordable_type, active) </code></pre> In general you can optimize your order by ascending cardinality. Cardinality is the number of possible values that a column will have in your dataset. In your example <code>active</code> is a boolean. (Please note that the fact that <code>boolean</code> can have only two values is not really important. It could be <code>int</code> if you know that it will have only two values: 0 and 1). The low cardinality of your <code>active</code> field means that with a single lookup we can eliminate half of the possible records (depending on your dataset of course). After this step your first index will look like: <pre class="prettyprint"><code>index1 (active, race_type, recordable_type) </code></pre> Besides cardinality you should pay attention to any logical hierarchy between the fields. Without knowing exactly what these names mean I surmise as a rule of thumb that certain race types will have their own recordables. - This won't eliminate the possibility of a recordable being used with more than one race type of course, but you have to choose an order and this seems to be the more logical one. - So we will use the <code>race_type</code>, <code>recordable_type</code> order. Now let's take alook at the second index. You introduced <code>recordable_id</code> here. Without knowing your dataset I can safely assume that the cardinality of recordable_id will be biger than recordable_type's. In other words there will be more id's than types. Also I suspect a hierarchy between type and id (smells like one-to-many). So let's put it after the type like: <pre class="prettyprint"><code>index2 (active, race_type, recordable_type, recordable_id) </code></pre> Now it's time to pay attention to an other important angle. Indexes have their own cost on your HDD (esentially free) and CPU cycles when modifying your DB. The subset of any index can be used starting from left to right. <code>index2</code> essentially contains <code>index1</code> as it is <code>index1 + recordable_id</code>, so you can just get rid of it and end up with a single one. Along cometh <code>user_id</code>. As an ID field it suggests high cardinality (many possible values), but note that it is not a rule that the "higher the cardinality the later a filed will be". We rather used cardinality as a beacon to help spot hierarchy-like relation between the fields. (And shrink index sizes). Does <code>user_id</code> point to the individual contestant who's data we are looking at (many-many possibilities)? Or is it the client who uploaded the data (very few possbilities)? It is hard to tell. You can just append it to our existing <code>index2</code> and you will end up with a single index that can be used in all three secnarios: <pre class="prettyprint"><code>search_index (active, race_type, recordable_type, recordable_id, user_id) </code></pre> ... or it might worth having a second index for this scanario... Your question is special because you only use <code>=</code> in your where clause. There are many other considerations if you had something like <code>AND (race_type = 1 OR race_type=8)</code> Not to mention <code>></code> or <code><</code>. Also if you use <code>ORDER BY</code> that can be factored in into the indexes you use.

imho <pre class="prettyprint"><code>alter table your_table add index ( race_type, recordable_type, active, user_id, recordable_id); // watch-out the max length allowed for an index </code></pre> the common found columns are <code>race_type, recordable_type, active</code>, and I think by building an index will all 5 columns will fit all search patterns. please let me know if the proposal does not work well

DB indices to use for multiple combination of queries of same set of columns?

Tags:

performance

database

indexing

mysql

I'm seeking guidance on how to think through the minimum number of indices you need for a table in which you a performing different combinations of queries on the same set of columns. Ideally, your answer would abstract some rules of thumb from this specific example (if that's possible).

This bulleted list represents three different query conditions commonly performed on my table:

WHERE race_type = ? AND recordable_type = ? AND active = ?
WHERE race_type = ? AND recordable_id = ? AND recordable_type = ? AND active = ?
WHERE user_id = ? AND race_type = ? AND recordable_id = ? AND recordable_type = ? AND active = ?

Note: user_id (int), race_type (varchar), recordable_id (int), recordable_type (varchar), active (boolean)

I could create individual multi-column indices for each of these, but you DB performance experts out there might approach it a different way.

If I need to provide more info in order to get the best answer, please, let me know.

903

asked Oct 22 '11 15:10

keruilin

3 Answers

If your conditions are hierarchical (like in your example) you can use a combined index. DBMS's have trouble working with multiple indexes at the same time. Although it is possible and they try to make the best out of such situation.

This does not change the fact that you should try to have a specific index for a certain where clause. If more WHEREs' indexes can be combined to a single one, then you free up some space and CPU cycles.

Let's start out by specifying an index for every WHERE:

index1 (race_type, recordable_type, active)
index2 (race_type, recordable_id, recordable_type, active)
index3 (user_id, race_type, recordable_id, recordable_type, active)

In general you can optimize your order by ascending cardinality. Cardinality is the number of possible values that a column will have in your dataset. In your example active is a boolean. (Please note that the fact that boolean can have only two values is not really important. It could be int if you know that it will have only two values: 0 and 1).

The low cardinality of your active field means that with a single lookup we can eliminate half of the possible records (depending on your dataset of course). After this step your first index will look like:

index1 (active, race_type, recordable_type)

Besides cardinality you should pay attention to any logical hierarchy between the fields. Without knowing exactly what these names mean I surmise as a rule of thumb that certain race types will have their own recordables. - This won't eliminate the possibility of a recordable being used with more than one race type of course, but you have to choose an order and this seems to be the more logical one. - So we will use the race_type, recordable_type order.

Now let's take alook at the second index. You introduced recordable_id here. Without knowing your dataset I can safely assume that the cardinality of recordable_id will be biger than recordable_type's. In other words there will be more id's than types. Also I suspect a hierarchy between type and id (smells like one-to-many). So let's put it after the type like:

index2 (active, race_type, recordable_type, recordable_id)

Now it's time to pay attention to an other important angle. Indexes have their own cost on your HDD (esentially free) and CPU cycles when modifying your DB. The subset of any index can be used starting from left to right. index2 essentially contains index1 as it is index1 + recordable_id, so you can just get rid of it and end up with a single one.

Along cometh user_id. As an ID field it suggests high cardinality (many possible values), but note that it is not a rule that the "higher the cardinality the later a filed will be". We rather used cardinality as a beacon to help spot hierarchy-like relation between the fields. (And shrink index sizes).

Does user_id point to the individual contestant who's data we are looking at (many-many possibilities)? Or is it the client who uploaded the data (very few possbilities)? It is hard to tell. You can just append it to our existing index2 and you will end up with a single index that can be used in all three secnarios:

search_index (active, race_type, recordable_type, recordable_id, user_id)

... or it might worth having a second index for this scanario...

Your question is special because you only use = in your where clause. There are many other considerations if you had something like AND (race_type = 1 OR race_type=8) Not to mention > or <. Also if you use ORDER BY that can be factored in into the indexes you use.

137

answered Oct 12 '22 22:10

vbence

First step is to use EXPLAIN on the queries you are considering for optimization. MySQL explain will return vital information on which indices will be used to accomplish the query, and will help you optimize your queries.

In my experience I have seen tables take on any number of composite index permutations, it's really based on your application and which queries you will be issuing the most.

You should also consider changing your varchar columns to ids linking to lookup tables. It will add some extra schema to your database but you get the following benefits:

If you ever need to change the value of the column, you only have to change one row, vs thousands.
All columns you are considering for indices will be numeric, which by nature will be faster than varchars, and will give more overhead before reaching the maximum index length limit.

answered Oct 12 '22 21:10

Mike Purcell

imho

alter table your_table
add index ( race_type, recordable_type, active, user_id, recordable_id);
// watch-out the max length allowed for an index

the common found columns are race_type, recordable_type, active,
and I think by building an index will all 5 columns will fit all search patterns.

please let me know if the proposal does not work well

answered Oct 12 '22 22:10

ajreal

Related questions
                            
                                Javascript Confirm Delete in One PHP File (on href)
                            
                                Sending passwords over the web
                            
                                How to insert a string that contains quote marks into a database?
                            
                                undefined method `eq' for nil:NilClass with rails 3 and ruby enterprise on ubuntu hardy
                            
                                Select most occurring value in MySQL
                            
                                MySQL rollback on the MyISAM engine
                            
                                Mysql WHERE problem with comma-separated list
                            
                                mysql join with multiple values in one column
                            
                                Unable to start rails server
                            
                                PHP/MYSQL advanced search script. How?
                            
                                SQL Query for all pairs of elements that are only in different groups
                            
                                MYSQL command to execute multiple .sql files
                            
                                Imported tables are not showing up in phpmyadmin [closed]
                            
                                $(a,this).attr('href') returns undefined
                            
                                How to use mysql_real_escape_string function in PHP
                            
                                How to connect NetBeans to MySQL database?
                            
                                Import large MySQL .sql file on Windows with Force
                            
                                How do you store scientific notation numbers in MySQL
                            
                                MySql auto-incrementing Alpha-numeric primary key?
                            
                                which datatype i should use for mapping to boolean

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With