Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any performance gain in indexing a boolean field?

Just to put a finer point on several other answers here, since in my experience, those looking at questions like this are in the same boat we were, we've all heard that indexing Boolean fields is pointless, and yet...

We have a table with about 4 million rows, only about 1000 or so at a time will have a Boolean switch flagged and that's what we search against. Adding an index on our Boolean field sped up queries by orders of magnitude, it went from about 9+ seconds to a fraction of a second.


Not really. You should think about it like a book. If there were only 3 kinds of words in a book and you index all of them, you would have the same number of index pages as normal pages.

There would be a performance gain if there are relatively few records of one value. For example, if you have 1000 records and 10 of them are TRUE, then it would be useful if you searching with isok = 1

As Michael Durrant mentioned, it also makes writes slower.

EDIT: Possible duplication: Indexing boolean fields

Here it explains that even if you have an index, if you have too many records it doesn't use the index anyways. MySQL not using index when checking = 1 , but using it with = 0


It depends on the actual queries and the selectivity of the index/query combination.

Case A: condition WHERE isok = 1 and nothing else there:

SELECT *
FROM tableX
WHERE isok = 1
  • If the index is selective enough (say you have 1M rows and only 1k have isok = 1), then the SQL engine will probably use the index and be faster than without it.

  • If the index is not selective enough (say you have 1M rows and more than 100k have isok = 1), then the SQL engine will probably not use the index and do a table scan.

Case B: condition WHERE isok = 1 and more stuff:

SELECT *
FROM tableX
WHERE isok = 1
  AND another_column = 17

Then, it depends on what other indexes you have. An index on another_column would probably be more selective than the index on isok which has only two possible values. An index on (another_column, isok) or (isok, another_column) would be even better.


It depends on the distribution of the data.

Imagine I had a book with 1000 closely typed pages, and the only words in my book were 'yes' and 'no' repeated over and over and distributed randomly. If I was asked to circle all the instances of 'yes', would an index in the back of the book help? It depends.

If there was a half-and-half random distribution of yes's and no's, then looking up in the index wouldn't help. The index would make the book a lot bigger, and anyway I'd be quicker just to start from the front and work my way through each page looking for all the instances of 'yes' and circling them, rather than looking up each item in the index and then taking the reference from the index entry to the page that it refers to.

But if there were, say, just ten instances of 'yes' in my thousand-page book and everything else was just millions of no's, then an index would save me loads of time in finding those ten instances of 'yes' and circling them.

It's the same in databases. If it's a 50:50 distribution, then an index isn't going to help - the database engine is better off just ploughing through the data from start to finish (full table scan), and the index would just make the database bigger, and slower to write and update. But if it is something like a 4000:1 distribution (as per oucil in this thread), then an index seek can speed it up hugely, if it is the 1 in 4000 items that you are looking for.


No, usually not.

You usually index fields for searching when they have high selectivity/cardinality. A boolean field's cardinality is very low in most tables. It would also make your writes fractionally slower.


Actually this depends on queries you run. But, generally yes, as well as indexing a field of any other type.