I'm seeking guidance on how to think through the minimum number of indices you need for a table in which you a performing different combinations of queries on the same set of columns. Ideally, your answer would abstract some rules of thumb from this specific example (if that's possible).
This bulleted list represents three different query conditions commonly performed on my table:
Note: user_id (int), race_type (varchar), recordable_id (int), recordable_type (varchar), active (boolean)
I could create individual multi-column indices for each of these, but you DB performance experts out there might approach it a different way.
If I need to provide more info in order to get the best answer, please, let me know.
Yes, you can do it.
A multicolumn index is an index based on the values in multiple columns of a table.
It is possible for an index to have two or more columns. Multi column indexes are also known as compound or concatenated indexes.
If your conditions are hierarchical (like in your example) you can use a combined index. DBMS's have trouble working with multiple indexes at the same time. Although it is possible and they try to make the best out of such situation.
This does not change the fact that you should try to have a specific index for a certain where clause. If more WHEREs' indexes can be combined to a single one, then you free up some space and CPU cycles.
Let's start out by specifying an index for every WHERE:
index1 (race_type, recordable_type, active)
index2 (race_type, recordable_id, recordable_type, active)
index3 (user_id, race_type, recordable_id, recordable_type, active)
In general you can optimize your order by ascending cardinality. Cardinality is the number of possible values that a column will have in your dataset. In your example active
is a boolean. (Please note that the fact that boolean
can have only two values is not really important. It could be int
if you know that it will have only two values: 0 and 1).
The low cardinality of your active
field means that with a single lookup we can eliminate half of the possible records (depending on your dataset of course). After this step your first index will look like:
index1 (active, race_type, recordable_type)
Besides cardinality you should pay attention to any logical hierarchy between the fields. Without knowing exactly what these names mean I surmise as a rule of thumb that certain race types will have their own recordables. - This won't eliminate the possibility of a recordable being used with more than one race type of course, but you have to choose an order and this seems to be the more logical one. - So we will use the race_type
, recordable_type
order.
Now let's take alook at the second index. You introduced recordable_id
here. Without knowing your dataset I can safely assume that the cardinality of recordable_id will be biger than recordable_type's. In other words there will be more id's than types. Also I suspect a hierarchy between type and id (smells like one-to-many). So let's put it after the type like:
index2 (active, race_type, recordable_type, recordable_id)
Now it's time to pay attention to an other important angle. Indexes have their own cost on your HDD (esentially free) and CPU cycles when modifying your DB. The subset of any index can be used starting from left to right. index2
essentially contains index1
as it is index1 + recordable_id
, so you can just get rid of it and end up with a single one.
Along cometh user_id
. As an ID field it suggests high cardinality (many possible values), but note that it is not a rule that the "higher the cardinality the later a filed will be". We rather used cardinality as a beacon to help spot hierarchy-like relation between the fields. (And shrink index sizes).
Does user_id
point to the individual contestant who's data we are looking at (many-many possibilities)? Or is it the client who uploaded the data (very few possbilities)? It is hard to tell. You can just append it to our existing index2
and you will end up with a single index that can be used in all three secnarios:
search_index (active, race_type, recordable_type, recordable_id, user_id)
... or it might worth having a second index for this scanario...
Your question is special because you only use =
in your where clause. There are many other considerations if you had something like AND (race_type = 1 OR race_type=8)
Not to mention >
or <
. Also if you use ORDER BY
that can be factored in into the indexes you use.
First step is to use EXPLAIN
on the queries you are considering for optimization. MySQL explain will return vital information on which indices will be used to accomplish the query, and will help you optimize your queries.
In my experience I have seen tables take on any number of composite index permutations, it's really based on your application and which queries you will be issuing the most.
You should also consider changing your varchar columns to ids linking to lookup tables. It will add some extra schema to your database but you get the following benefits:
If you ever need to change the value of the column, you only have to change one row, vs thousands.
All columns you are considering for indices will be numeric, which by nature will be faster than varchars, and will give more overhead before reaching the maximum index length limit.
imho
alter table your_table
add index ( race_type, recordable_type, active, user_id, recordable_id);
// watch-out the max length allowed for an index
the common found columns are race_type, recordable_type, active
,
and I think by building an index will all 5 columns will fit all search patterns.
please let me know if the proposal does not work well
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With