So, most database experts say it's not effective creating an index on a column with very few unique values, relative to the size of the table.
Based on how databases work internally (I know most databases store indices using a B-tree), why does a B-Tree with few unique values make searching inefficient?
No, you dont have to index it again. When you specify UNIQUE KEY , the column is indexed. So it has no difference in performance with other indexed column (e.g. PRIMARY KEY) of same type. However if the type is different, there will be little performance difference.
Its disadvantages include increased disk space, slower data modification, and updating records in the clustered index.
If you create an index such as INDEX(first_name, last_name), don't create INDEX(first_name). However, "index prefix" or "multi-columns index" is not recommended in all search cases. Use the NOT NULL attribute for those columns in which you consider the indexing, so that NULL values will never be stored.
index: if it's not primary or unique, it doesn't constrain values inserted into the table, but it does allow them to be looked up more efficiently.
First, you need to understand how an index on a column works. In simple words it is nothing but,
an ordered list of all possible values in the given column with a pointer back to the actual record in the database.
Since it is ordered, a binary search can be used against it, rather than a linear search, which improves performance over a large dataset.
Imagine then, your index as a phone book ordered by a column, say last name
; but within the set of records with a similar last name
, there isn't a common pattern or meaningful order for the records: they are ordered purely random. And say we need to search this record:
Ike Smith 4783 Random Ave. Seattle, WA 98117
Since the phone book is ordered by last name
, we need only to go to the S
, then the m
, then the i
, etc. until we find Smith
. And (hopefully) there are only a couple of records under Smith
so we find the one we want fairly quickly.
Now, imagine you have a phone book ordered by city
instead of last name
. And within the records that match a given city
there is no particular order. And so we try our search again. However, once we find Seattle
(using a extremely sophisticated binary search) we are left with close to 620,778 records, which we have to check sequentially as they ordered completely random. We're stuck checking every single entry for the record we want.
This is what happens when you use a very common column as the base of your index: the binary search returns a very large set of possible records with which the database can't make any assumptions beyond the initial indexed column values, so it needs to check sequentially within the resulting set for a matching record.
If the phone book were instead ordered by zip code
(a less common variable), then you might find yourself only searching for 18,623 records residing on 98117
.
Furthermore, a true phone book usually resembles a composite index: instead of just ordering by a single column (i.e. last name
), the values within the resulting set are then ordered by another column (say first name
), and then another (middle name
?) so the search can be done sub linearly at every step until you find the record needed. It it basically an index within an index, where even if the first column is not that common, the combination with the second one provides a specific enough criteria that only a small set of records need to be search linearly.
In general, the goal of the index is to provide faster than linear search by avoiding having to scan through a significant portion of the data in the table (see http://en.wikipedia.org/wiki/Database_index). If many of the would-be indexed values are identical, the database has to scan a significant portion of the table even after a successful index lookup.
So an index that has few unique values would provide very little performance benefit independent of its implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With