Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are unique indexes better for column search performance? (PGSQL & MySQL)

I am curious as to whether

CREATE INDEX idx ON tbl (columns); 

vs.

CREATE UNIQUE INDEX idx ON tbl (columns); 

has a significant algorithmic performance benefit in PostgreSQL or MySQL implementations when scanning the indexed column(s), or whether the UNIQUE keyword simply introduces a unique constraint alongside the index.

I imagine it is probably fair to say that there is a marginal benefit insofar as indexes are likely to be internally implemented as some sort of hash1-like structure, and collision handling by definition result in something other than O(1) performance. Given this premise, it is likely that if a large percentage of values are identical than the structure degenerates into something linear.

So, for purposes of my question, assume that the distribution of values is relatively discrete and uniform.

Thanks in advance!

1 Which is a matter of pure speculation for me, as I am not familiar with RDBM internals.

like image 953
Alex Balashov Avatar asked Aug 18 '09 12:08

Alex Balashov


People also ask

Is unique index faster Postgres?

As for speed - unique should be faster - when index scanning finds row with given value, it doesn't have to search if there are any other rows with this value, and can finish scanning imemdiately.

Does unique index improve performance?

In addition to enforcing the uniqueness of data values, a unique index can also be used to improve data retrieval performance during query processing.

Should indexed column be unique?

No, you dont have to index it again. When you specify UNIQUE KEY , the column is indexed. So it has no difference in performance with other indexed column (e.g. PRIMARY KEY) of same type. However if the type is different, there will be little performance difference.


2 Answers

If your data are unique, you should create a UNIQUE index on them.

This implies no additional overhead and affects optimizer's decisions in certain cases so that it can choose a better algorithm.

In SQL Server and in PostgreSQL, for instance, if you sort on a UNIQUE key, the optimizer ignores the ORDER BY clauses used after that (since they are irrelevant), i. e. this query:

SELECT  * FROM    mytable ORDER BY         col_unique, other_col LIMIT 10 

will use an index on col_unique and won't sort on other_col because it's useless.

This query:

SELECT  * FROM    mytable WHERE   mycol IN         (         SELECT  othercol         FROM    othertable         ) 

will also be converted into an INNER JOIN (as opposed to a SEMI JOIN) if there is a UNIQUE index on othertable.othercol.

An index always contains some kind of a pointer to the row (ctid in PostgreSQL, row pointer in MyISAM, primary key/uniquifier in InnoDB) and the leaves are ordered on these pointers, so in fact every index leaf is unique is some way (though it may not be obvious).

See this article in my blog for performance details:

  • Making an index UNIQUE
like image 195
Quassnoi Avatar answered Sep 27 '22 20:09

Quassnoi


There is a small penalty during update/insert operations for having the unique constraint. It has to search before the insert/update operation to make sure the uniqueness constraint isn't violated.

like image 20
Eric Avatar answered Sep 27 '22 18:09

Eric