Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL index cardinality - performance vs storage efficiency

Tags:

Say you have a MySQL 5.0 MyISAM table with 100 million rows, with one index (other than primary key) on two integer columns.

From my admittedly poor understanding of B-tree structure, I believe that a lower cardinality means the storage efficiency of the index is better, because there are less parent nodes. Whereas a higher cardinality means less efficient storage, but faster read performance, because it has to navigate through less branches to get to whatever data it is looking for to narrow down the rows for the query.

(Note - by "low" vs "high", I don't mean e.g. 1 million vs 99 million for a 100 million row table. I mean more like 90 million vs 95 million)

Is my understanding correct?

Related question - How does cardinality affect write performance?

like image 591
Sean Avatar asked Apr 08 '10 02:04

Sean


People also ask

How does cardinality affect query performance?

Cardinality is the estimated number of rows the step will return. Cost is the estimated amount of work the plan will do. A higher cardinality => you're going to fetch more rows => you're going to do more work => the query will take longer.

What are the disadvantages of indexes in MySQL?

The Drawbacks of Using IndexesIndexes consume disk space – an index occupies its own space, so indexed data will consume more disk space too; Redundant and duplicate indexes can be a problem – MySQL allows you to create duplicate indexes on a column and it does not “protect you” from doing such a mistake.

Is index useful for low cardinality?

Cardinality is important — cardinality means the number of distinct values in a column. If you create an index in a column that has low cardinality, that's not going to be beneficial since the index should reduce search space. Low cardinality does not significantly reduce search space.

What is index cardinality and why is it important?

Index cardinality refers to the uniqueness of values stored in a specified column within an index. MySQL generates the index cardinality based on statistics stored as integers, therefore, the value may not be necessarily exact.


1 Answers

Whereas a higher cardinality means less efficient storage, but faster read performance, because it has to navigate through less branches to get to whatever data it is looking for to narrow down the rows for the query.

Higher cardinality means better read performance because, by definition, there are fewer records to read.

To process a query like this:

SELECT  * FROM    mytable WHERE   indexed_col = @myvalue 

, the engine should do the following steps:

  1. Find the first entry satisfying the condition.

    This is done traversing the B-Tree, starting from the root entry.

    Across the pages, the search is performed by following B-Tree links; within a page, the search is performed using binary search (unless your keys are compressed, in which case it's a linear search).

    This algorithm same efficiency for both high cardinality and low cardinality columns. Finding the first 3 (as opposed to any 3) in these lists:

    1  2  3  4  5  6  7  8  9  10  3  3  3  3  3  3  3  3  4  4 

    requires same O(log(n)) steps.

  2. Traversing the index until the key value changes. This, of course, requires linear time: the more records you have, the more you need to traverse.

If you only need the first record:

SELECT  * FROM    mytable WHERE   indexed_col = @myvalue LIMIT 1 

, the column cardinality does not affect read performance.

How does cardinality affect write performance?

Each index key has a hidden additional value: a record pointer. This is the whole point of having an index: you need to know which record does it point to.

Since a record pointer, by definition, is unique, each index key is unique too. The index entries sharing the same key value are sorted by the record pointer.

This is to make the index maintainable: if you delete a record with a value of an indexed column shared by a million of other records, the corresponding index record should be deleted too. But the whole million of the index records is not being looked through: instead, the record pointer is used as an additional search condition.

Each index key is in fact unique (even if you don't define the index as unique), and, hence, has maximum cardinality possible.

So the answer to your questions is: no, the column cardinality does not affect the index write performance.

like image 73
Quassnoi Avatar answered Oct 14 '22 08:10

Quassnoi