Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DB indices to use for multiple combination of queries of same set of columns?

I'm seeking guidance on how to think through the minimum number of indices you need for a table in which you a performing different combinations of queries on the same set of columns. Ideally, your answer would abstract some rules of thumb from this specific example (if that's possible).

This bulleted list represents three different query conditions commonly performed on my table:

  • WHERE race_type = ? AND recordable_type = ? AND active = ?
  • WHERE race_type = ? AND recordable_id = ? AND recordable_type = ? AND active = ?
  • WHERE user_id = ? AND race_type = ? AND recordable_id = ? AND recordable_type = ? AND active = ?

Note: user_id (int), race_type (varchar), recordable_id (int), recordable_type (varchar), active (boolean)

I could create individual multi-column indices for each of these, but you DB performance experts out there might approach it a different way.

If I need to provide more info in order to get the best answer, please, let me know.

like image 903
keruilin Avatar asked Oct 22 '11 15:10

keruilin


People also ask

Can we create multiple index on same columns in SQL Server?

Yes, you can do it.

Which is the index used for multiple fields?

A multicolumn index is an index based on the values in multiple columns of a table.

Can a table have multiple indexes for multiple columns?

It is possible for an index to have two or more columns. Multi column indexes are also known as compound or concatenated indexes.


3 Answers

If your conditions are hierarchical (like in your example) you can use a combined index. DBMS's have trouble working with multiple indexes at the same time. Although it is possible and they try to make the best out of such situation.

This does not change the fact that you should try to have a specific index for a certain where clause. If more WHEREs' indexes can be combined to a single one, then you free up some space and CPU cycles.

Let's start out by specifying an index for every WHERE:

index1 (race_type, recordable_type, active)
index2 (race_type, recordable_id, recordable_type, active)
index3 (user_id, race_type, recordable_id, recordable_type, active)

In general you can optimize your order by ascending cardinality. Cardinality is the number of possible values that a column will have in your dataset. In your example active is a boolean. (Please note that the fact that boolean can have only two values is not really important. It could be int if you know that it will have only two values: 0 and 1).

The low cardinality of your active field means that with a single lookup we can eliminate half of the possible records (depending on your dataset of course). After this step your first index will look like:

index1 (active, race_type, recordable_type)

Besides cardinality you should pay attention to any logical hierarchy between the fields. Without knowing exactly what these names mean I surmise as a rule of thumb that certain race types will have their own recordables. - This won't eliminate the possibility of a recordable being used with more than one race type of course, but you have to choose an order and this seems to be the more logical one. - So we will use the race_type, recordable_type order.

Now let's take alook at the second index. You introduced recordable_id here. Without knowing your dataset I can safely assume that the cardinality of recordable_id will be biger than recordable_type's. In other words there will be more id's than types. Also I suspect a hierarchy between type and id (smells like one-to-many). So let's put it after the type like:

index2 (active, race_type, recordable_type, recordable_id)

Now it's time to pay attention to an other important angle. Indexes have their own cost on your HDD (esentially free) and CPU cycles when modifying your DB. The subset of any index can be used starting from left to right. index2 essentially contains index1 as it is index1 + recordable_id, so you can just get rid of it and end up with a single one.

Along cometh user_id. As an ID field it suggests high cardinality (many possible values), but note that it is not a rule that the "higher the cardinality the later a filed will be". We rather used cardinality as a beacon to help spot hierarchy-like relation between the fields. (And shrink index sizes).

Does user_id point to the individual contestant who's data we are looking at (many-many possibilities)? Or is it the client who uploaded the data (very few possbilities)? It is hard to tell. You can just append it to our existing index2 and you will end up with a single index that can be used in all three secnarios:

search_index (active, race_type, recordable_type, recordable_id, user_id)

... or it might worth having a second index for this scanario...

Your question is special because you only use = in your where clause. There are many other considerations if you had something like AND (race_type = 1 OR race_type=8) Not to mention > or <. Also if you use ORDER BY that can be factored in into the indexes you use.

like image 137
vbence Avatar answered Oct 12 '22 22:10

vbence


First step is to use EXPLAIN on the queries you are considering for optimization. MySQL explain will return vital information on which indices will be used to accomplish the query, and will help you optimize your queries.

In my experience I have seen tables take on any number of composite index permutations, it's really based on your application and which queries you will be issuing the most.

You should also consider changing your varchar columns to ids linking to lookup tables. It will add some extra schema to your database but you get the following benefits:

  1. If you ever need to change the value of the column, you only have to change one row, vs thousands.

  2. All columns you are considering for indices will be numeric, which by nature will be faster than varchars, and will give more overhead before reaching the maximum index length limit.

like image 31
Mike Purcell Avatar answered Oct 12 '22 21:10

Mike Purcell


imho

alter table your_table
add index ( race_type, recordable_type, active, user_id, recordable_id);
// watch-out the max length allowed for an index

the common found columns are race_type, recordable_type, active,
and I think by building an index will all 5 columns will fit all search patterns.

please let me know if the proposal does not work well

like image 2
ajreal Avatar answered Oct 12 '22 22:10

ajreal