Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does it make sense to use an index that will have a low cardinality?

I'm mainly an Actionscript developer and by no means an expert in SQL, but from time to time I have to develop simple server side stuff. So, I thought I'd ask more experienced people about the question in the title.

My understanding is that you don't gain much by setting an index in a column that will hold few distinct values. I have a column that holds a boolean value (actually it's a small int, but I'm using it as a flag), and this column is used in the WHERE clauses of most of the queries I have. In a theoretical "average" case, half of the records' values will be 1 and the other half, 0. So, in this scenario, the database engine could avoid a full table scan, but will have to read a lot of rows anyway (total rows/2).

So, should I make this column an index?

For the record, I'm using Mysql 5, but I'm more interested in a general rationale on why it does / does not make sense indexing a column that I know that will have a low cardinality.

Thanks in advance.

like image 233
Juan Pablo Califano Avatar asked Jan 21 '10 21:01

Juan Pablo Califano


People also ask

Is index useful for low cardinality?

Cardinality is important — cardinality means the number of distinct values in a column. If you create an index in a column that has low cardinality, that's not going to be beneficial since the index should reduce search space. Low cardinality does not significantly reduce search space.

What is index cardinality and why is it important?

Index cardinality refers to the uniqueness of values stored in a specified column within an index. MySQL generates the index cardinality based on statistics stored as integers, therefore, the value may not be necessarily exact.

What is a disadvantage to using an index?

As every component in programming has its own set of pros and cons, an index in SQL also has its advantages and disadvantages. Its disadvantages include increased disk space, slower data modification, and updating records in the clustered index.


1 Answers

An index can help even on low cardinality fields if:

  1. When one of possible values is very infrequent compared to the other values and you search for it.

    For instance, there are very few color blind women, so this query:

    SELECT  * FROM    color_blind_people WHERE   gender = 'F' 

    would most probably benefit from an index on gender.

  2. When the values tend to be grouped in the table order:

    SELECT  * FROM    records_from_2008 WHERE   year = 2010 LIMIT 1 

    Though there are only 3 distinct years here, records with earlier years are most probably added first so very many records would have to be scanned prior to returning the first 2010 record if not for the index.

  3. When you need ORDER BY / LIMIT:

    SELECT  * FROM    people ORDER BY         gender, id LIMIT 1 

    Without the index, a filesort would be required. Though it's somewhat optimized do to the LIMIT, it would still need a full table scan.

  4. When the index covers all fields used in the query:

    CREATE INDEX (low_cardinality_record, value)  SELECT  SUM(value) FROM    mytable WHERE   low_cardinality_record = 3 
  5. When you need DISTINCT:

    SELECT  DISTINCT color FROM    tshirts 

    MySQL will use INDEX FOR GROUP-BY, and if you have few colors, this query will be instant even with millions of records.

    This is an example of a scenario when the index on a low cardinality field is more efficient than that on a high cardinality field.

Note that if DML performance is not much on an issue, then it's safe to create the index.

If optimizer thinks that the index is inefficient, the index just will not be used.

like image 102
Quassnoi Avatar answered Sep 19 '22 16:09

Quassnoi