Efficiency of indexes for a field with low cardinality

Question

For example There is a field ( can be null) in a postgres database which stores enum value and that enum has only two values A,B.

Now my all select query have where clause on this field.

I have a question will adding a index to this field will be a good approach or it will not increase any performance as each row contains either A or B or a null.

Is there a way i can increase performance of all get call.

Please help

jjanes · Accepted Answer

An index just on that column is unlikely to be useful, unless the distribution of values is very skewed (e.g. 99% A, 0.99% NULL, 0.01% B). But in that case you would probably be better off with a partial index on some other field WHERE this_field='B'.

But even with an more uniform distribution of values (33.33% A, 33.33% NULL, 33.33% B) it could be useful to include that column as the leading column in some multicolumn indexes. For example, for WHERE this_field='A' and other_field=7945, the index on (this_field, other_field) would generally be about 3 times more efficient than one on just (other_field) if the distribution of value is even.

Where it could make a huge difference is with something like WHERE this_field='A' ORDER by other_field LIMIT 5. With the index on (this_field, other_field) it can jump right to the proper spot in the index and read off the first 5 rows (which pass checking for visibility) already in order and then stop. If the index were just on (other_field) it might, if the two columns are not statistically independent of each other, have to skip over any arbitrary number of 'B' or NULL rows before finding 5 with 'A'.

Efficiency of indexes for a field with low cardinality

Tags:

performance

database

indexing

postgresql

girlwhocode

1 Answers

jjanes

Recent Activity

Donate For Us

Efficiency of indexes for a field with low cardinality

Tags:

performance

database

indexing

postgresql

girlwhocode

1 Answers

jjanes

Related questions

Recent Activity

Donate For Us