I am looking into adding a composite index to a table in a MySQL database which will likely be several million rows in size. The composite will be comprised of two varchar
columns as well as three int columns.
My question is as stated in the title: Is there an optimal order in which to create this composite index?
For instance, one of the int
rows will likely only have 6 possible values, would it better for that column to be closer to the front of the index definition? Likewise, one of the varchar
columns will likely have millions of different values, should that be near the front or back of the index definition?
Execution is most efficient when you create a composite index with the columns in order from most to least distinct. In other words, the column that returns the highest count of distinct rows when queried with the DISTINCT keyword in the SELECT statement should come first in the composite index.
Selectivity of the individual columns in a composite index does not matter when picking the order.
So the order of columns in a multi-column index definitely matters. One type of query may need a certain column order for the index.
As a rule of thumb, in a multi-column index, you want the columns that have the highest cardinality, or in other words, the highest number of distinct values, to come first in the index.
To be more accurate, you want the column with the fewest possible matches to your search criteria first so you can narrow the result set down as much as possible, but in general, it's the same as the highest cardinality.
So, in your example, you'll want the column that will have millions of distinct values to be in the index before the one with only 6 distinct values.
Assuming you're selecting only one row out of the millions of values, it allows you to eliminate more rows faster.
When considering two columns of similar cardinality, put the smaller one first (INTEGER
columns before VARCHAR
columns) because MySQL can compare and iterate over them faster.
One caveat is that if you are selecting with ranges (eg. WHERE datecol > NOW()
), then you want the range columns farthest to the right, and your columns with a single constant (eg. WHERE id = 1
) to the left. This is because your index can only be used for searching and ordering up to the point of the first range value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With