Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should every User Table have a Clustered Index?

Tags:

Recently I found a couple of tables in a Database with no Clustered Indexes defined. But there are non-clustered indexes defined, so they are on HEAP.

On analysis I found that select statements were using filter on the columns defined in non-clustered indexes.

Not having a clustered index on these tables affect performance?

like image 980
Sreedhar Avatar asked Aug 03 '12 00:08

Sreedhar


People also ask

Can we have only non-clustered index?

We can have multiple non-clustered indexes in SQL tables because it is a logical index and does not sort data physically as compared to the clustered index.

Why do we need clustered index?

By Default Primary Keys Of The Table is a Clustered Index. It can be used with unique constraint on the table which acts as a composite key. A clustered index can improve the performance of data retrieval. It should be created on columns which are used in joins.

Can a table have non-clustered index without clustered index?

Generally, nonclustered indexes are created to improve the performance of frequently used queries not covered by the clustered index or to locate rows in a table without a clustered index (called a heap). You can create multiple nonclustered indexes on a table or indexed view.

How is data stored if there is no clustered index?

If a table has no clustered index, its data rows are stored in an unordered structure called a heap.


3 Answers

It's hard to state this more succinctly than SQL Server MVP Brad McGehee:

As a rule of thumb, every table should have a clustered index. Generally, but not always, the clustered index should be on a column that monotonically increases–such as an identity column, or some other column where the value is increasing–and is unique. In many cases, the primary key is the ideal column for a clustered index.

BOL echoes this sentiment:

With few exceptions, every table should have a clustered index.

The reasons for doing this are many and are primarily based upon the fact that a clustered index physically orders your data in storage.

  • If your clustered index is on a single column monotonically increases, inserts occur in order on your storage device and page splits will not happen.

  • Clustered indexes are efficient for finding a specific row when the indexed value is unique, such as the common pattern of selecting a row based upon the primary key.

  • A clustered index often allows for efficient queries on columns that are often searched for ranges of values (between, >, etc.).

  • Clustering can speed up queries where data is commonly sorted by a specific column or columns.

  • A clustered index can be rebuilt or reorganized on demand to control table fragmentation.

  • These benefits can even be applied to views.

You may not want to have a clustered index on:

  • Columns that have frequent data changes, as SQL Server must then physically re-order the data in storage.

  • Columns that are already covered by other indexes.

  • Wide keys, as the clustered index is also used in non-clustered index lookups.

  • GUID columns, which are larger than identities and also effectively random values (not likely to be sorted upon), though newsequentialid() could be used to help mitigate physical reordering during inserts.

  • A rare reason to use a heap (table without a clustered index) is if the data is always accessed through nonclustered indexes and the RID (SQL Server internal row identifier) is known to be smaller than a clustered index key.

Because of these and other considerations, such as your particular application workloads, you should carefully select your clustered indexes to get maximum benefit for your queries.

Also note that when you create a primary key on a table in SQL Server, it will by default create a unique clustered index (if it doesn't already have one). This means that if you find a table that doesn't have a clustered index, but does have a primary key (as all tables should), a developer had previously made the decision to create it that way. You may want to have a compelling reason to change that (of which there are many, as we've seen). Adding, changing or dropping the clustered index requires rewriting the entire table and any non-clustered indexes, so this can take some time on a large table.

like image 183
Tim Lehner Avatar answered Oct 11 '22 19:10

Tim Lehner


I would not say "Every table should have a clustered index", I would say "Look carefully at every table and how they are accessed and try to define a clustered index on it if it makes sense". It's a plus, like a Joker, you have only one Joker per table, but you don't have to use it. Other database systems don't have this, at least in this form, BTW.

Putting clustered indices everywhere without understanding what you're doing can also kill your performance (in general, the INSERT performance because a clustered index means physical re-ordering on the disk, or at least it's a good way to understand it), for example with GUID primary keys as we see more and more.

So, read Tim Lehner's exceptions and reason.

like image 38
Simon Mourier Avatar answered Oct 11 '22 19:10

Simon Mourier


Performance is a big hairy problem. Make sure you are optimizing for the right thing.

Free advice is always worth it's price, and there is no substitute for actual experimentation.

The purpose of an index is to find matching rows and help retrieve the data when found.

A non-clustered index on your search criteria will help to find rows, but there needs to be additional operation to get at the row's data.

If there is no clustered index, SQL uses an internal rowId to point to the location of the data.

However, If there is a clustered index on the table, that rowId is replaced by the data values in the clustered index.

So the step of reading the rows data would not be needed, and would be covered by the values in the index.

Even if a clustered index isn't very good at being selective, if those keys are frequently most or all of the results requested - it may be helpful to have them as the leaf of the non-clustered index.

like image 23
Rawheiser Avatar answered Oct 11 '22 19:10

Rawheiser