I remember reading at one point that indexing a field with low cardinality (a low number of distinct values) is not really worth doing. I admit I don't know enough about how indexes work to understand why that is.
So what if I have a table with 100 million rows in it, and I am selecting records where a bit field is 1? And let's say that at any point in time, there are only a handful of records where the bit field is 1 (as opposed to 0). Is it worth indexing that bit field or not? Why?
Of course I can just test it and check the execution plan, and I will do that, but I'm also curious about the theory behind it. When does cardinality matter and when does it not?
Decide which fields to index You'll probably want to index fields that you search frequently, fields that you sort, and fields that you join to fields in other tables in multiple table queries. Indexes can speed up searches and queries, but they can slow down performance when you add or update data.
A bitmap index is a special kind of database index which uses bitmaps or bit array. In a bitmap index, Oracle stores a bitmap for each index key. Each index key stores pointers to multiple rows. For example, if you create a bitmap index on the gender column of the members table.
Indexes should not be used on small tables. Tables that have frequent, large batch updates or insert operations. Indexes should not be used on columns that contain a high number of NULL values. Columns that are frequently manipulated should not be indexed.
Indexes are used to retrieve data from the database more quickly than otherwise. The users cannot see the indexes, they are just used to speed up searches/queries. Note: Updating a table with indexes takes more time than updating a table without (because the indexes also need an update).
Consider what an index is in SQL - and index is really a chunk of memory pointing at other chunks of memory (i.e. pointers to rows). The index is broken into pages so that portions of the index can be loaded and unloaded from memory depending on usage.
When you ask for a set of rows, SQL uses the index to find the rows more quickly than table scanning (looking at every row).
SQL has clustered and non-clustered indexes. My understanding of clustered indexes is that they group similar index values into the same page. This way when you ask for all the rows matching an index value, SQL can return those rows from a clustered page of memory. This is why trying to cluster index a GUID column is a bad idea - you don't try to cluster random values.
When you index an integer column, SQL's index contains a set of rows for each index value. If you have a range of 1 to 10, then you would have 10 index pointers. Depending on how many rows there are this can be paged differently. If your query looks for the index matching "1" and then where Name contains "Fred" (assuming the Name column is not indexed), SQL gets the set of rows matching "1" very quickly, then table scans to find the rest.
So what SQL is really doing is trying to reduce the working set (number of rows) it has to iterate over.
When you index a bit field (or some narrow range), you only reduce the working set by the number of rows matching that value. If you have a small number of rows matching it would reduce your working set a lot. For a large number of rows with 50/50 distribution, it might buy you very little performance gain vs. keeping the index up to date.
The reason everyone says to test is because SQL contains a very clever and complex optimizer that may ignore an index if it decides table scanning is faster, or may use a sort, or may organize memory pages however it darn well likes.
I just came across this question by way of another. Assuming that your statement that only a handful of the records assume the value of 1 (and that those are the ones you're interested in), then a filtered index could be a good choice. Something like:
create index [IX_foobar] on dbo.Foobar (FooID) where yourBitColumn = 1
This will create a substantially smaller index that the optimizer is smart enough to use when that is a predicate in your query.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With