Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I index a bit field in SQL Server?

I remember reading at one point that indexing a field with low cardinality (a low number of distinct values) is not really worth doing. I admit I don't know enough about how indexes work to understand why that is.

So what if I have a table with 100 million rows in it, and I am selecting records where a bit field is 1? And let's say that at any point in time, there are only a handful of records where the bit field is 1 (as opposed to 0). Is it worth indexing that bit field or not? Why?

Of course I can just test it and check the execution plan, and I will do that, but I'm also curious about the theory behind it. When does cardinality matter and when does it not?

like image 304
jeremcc Avatar asked Oct 23 '08 19:10

jeremcc


People also ask

What field should be indexed?

Decide which fields to index You'll probably want to index fields that you search frequently, fields that you sort, and fields that you join to fields in other tables in multiple table queries. Indexes can speed up searches and queries, but they can slow down performance when you add or update data.

What is a bit index in SQL?

A bitmap index is a special kind of database index which uses bitmaps or bit array. In a bitmap index, Oracle stores a bitmap for each index key. Each index key stores pointers to multiple rows. For example, if you create a bitmap index on the gender column of the members table.

When should indexes be avoided in SQL?

Indexes should not be used on small tables. Tables that have frequent, large batch updates or insert operations. Indexes should not be used on columns that contain a high number of NULL values. Columns that are frequently manipulated should not be indexed.

When should I use index in SQL Server?

Indexes are used to retrieve data from the database more quickly than otherwise. The users cannot see the indexes, they are just used to speed up searches/queries. Note: Updating a table with indexes takes more time than updating a table without (because the indexes also need an update).


2 Answers

Consider what an index is in SQL - and index is really a chunk of memory pointing at other chunks of memory (i.e. pointers to rows). The index is broken into pages so that portions of the index can be loaded and unloaded from memory depending on usage.

When you ask for a set of rows, SQL uses the index to find the rows more quickly than table scanning (looking at every row).

SQL has clustered and non-clustered indexes. My understanding of clustered indexes is that they group similar index values into the same page. This way when you ask for all the rows matching an index value, SQL can return those rows from a clustered page of memory. This is why trying to cluster index a GUID column is a bad idea - you don't try to cluster random values.

When you index an integer column, SQL's index contains a set of rows for each index value. If you have a range of 1 to 10, then you would have 10 index pointers. Depending on how many rows there are this can be paged differently. If your query looks for the index matching "1" and then where Name contains "Fred" (assuming the Name column is not indexed), SQL gets the set of rows matching "1" very quickly, then table scans to find the rest.

So what SQL is really doing is trying to reduce the working set (number of rows) it has to iterate over.

When you index a bit field (or some narrow range), you only reduce the working set by the number of rows matching that value. If you have a small number of rows matching it would reduce your working set a lot. For a large number of rows with 50/50 distribution, it might buy you very little performance gain vs. keeping the index up to date.

The reason everyone says to test is because SQL contains a very clever and complex optimizer that may ignore an index if it decides table scanning is faster, or may use a sort, or may organize memory pages however it darn well likes.

like image 100
Geoff Cox Avatar answered Sep 22 '22 01:09

Geoff Cox


I just came across this question by way of another. Assuming that your statement that only a handful of the records assume the value of 1 (and that those are the ones you're interested in), then a filtered index could be a good choice. Something like:

create index [IX_foobar] on dbo.Foobar (FooID) where yourBitColumn = 1 

This will create a substantially smaller index that the optimizer is smart enough to use when that is a predicate in your query.

like image 32
Ben Thul Avatar answered Sep 21 '22 01:09

Ben Thul