If I know an index will have unique values, how will it affect performance on inserts or selects if I declare it as such.
If the optimiser knows the index is unique how will that affect the query plan?
I understand that specifying uniquenes can serve to preserve integrity, but leaving that discussion aside for the moment, what are the perfomance consequences.
Unique indexes are indexes that help maintain data integrity by ensuring that no rows of data in a table have identical key values. When you create a unique index for an existing table with data, values in the columns or expressions that comprise the index key are checked for uniqueness.
The unique piece is not where the difference lies. The index and key are not the same thing, and are not comparable. A key is a data column, or several columns, that are forced to be unique with a constraint, either primary key or explicitly defined unique constraint.
SQL Server does not require a clustered index to be unique, but yet it must have some means of uniquely identifying every row. That's why, for non-unique clustered indexes, SQL Server adds to every duplicate instance of a clustering key value a 4-byte integer value called a uniqueifier.
The syntax to create an index in SQL is: CREATE [UNIQUE] INDEX index_name ON table_name (column1, column2, ... column_n); UNIQUE.
Long story short: if your data are intrinsically UNIQUE
, you will benefit from creating a UNIQIE
index on them.
See the article in my blog for detailed explanation:
UNIQUE
Now, the gory details.
As @Mehrdad said, UNIQUENESS
affects the estimated row count in the plan builder.
UNIQUE
index has maximal possible selectivity, that's why:
SELECT * FROM table1 t2, table2 t2 WHERE t1.id = :myid AND t2.unique_indexed_field = t1.value
almost surely will use NESTED LOOPS
, while
SELECT * FROM table1 t2, table2 t2 WHERE t1.id = :myid AND t2.non_unique_indexed_field = t1.value
may benefit from a HASH JOIN
if the optimizer thinks that non_unique_indexed_field
is not selective.
If your index is CLUSTERED
(i. e. the rows theirselves are contained in the index leaves) and non-UNIQUE
, then a special hidden column called uniquifier
is added to each index key, thus making the key larger and the index slower.
That's why UNIQUE CLUSTERED
index is in fact a little more efficicent than a non-UNIQUE CLUSTERED
one.
In Oracle
, a join on UNIQUE INDEX
is required for a such called key preservation
, which ensures that each row from a table will be selected at most once and makes a view updatable.
This query:
UPDATE ( SELECT * FROM mytable t1, mytable t2 WHERE t2.reference = t1.unique_indexed_field ) SET value = other_value
will work in Oracle
, while this one:
UPDATE ( SELECT * FROM mytable t1, mytable t2 WHERE t2.reference = t1.non_unique_indexed_field ) SET value = other_value
will fail.
This is not an issue with SQL Server
, though.
One more thing: for a table like this,
CREATE TABLE t_indexer (id INT NOT NULL PRIMARY KEY, uval INT NOT NULL, ival INT NOT NULL) CREATE UNIQUE INDEX ux_indexer_ux ON t_indexer (uval) CREATE INDEX ix_indexer_ux ON t_indexer (ival)
, this query:
/* Sorts on the non-unique index first */ SELECT TOP 1 * FROM t_indexer ORDER BY ival, uval
will use a TOP N SORT
, while this one:
/* Sorts on the unique index first */ SELECT TOP 1 * FROM t_indexer ORDER BY uval, ival
will use just an index scan.
For the latter query, there is no point in additional sorting on ival
, since uval
are unique anyway, and the optimizer takes this into account.
On sample data of 200,000
rows (id == uval == ival
), the former query runs for 15
seconds, while the latter one is instant.
Of course the optimizer will take uniqueness in consideration. It affects the expected row count in query plans.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With