Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clustered vs Non-Clustered

My lower level knowledge of SQL (Server 2008) is limited, and is now being challanged by our DBAs. Let me explain (I have mentioned obvious statements in the hope that I am right, but if you see something wrong, please tell me) the scenario:

We have a table which holds 'Court Orders' for people. When I created the table, (Name: CourtOrder), I created it like:

CREATE TABLE dbo.CourtOrder (   CourtOrderID INT NOT NULL IDENTITY(1,1), (Primary Key)   PersonId INT NOT NULL,   + around 20 other fields of different types. ) 

I then applied a non-clustered index to the primary key (for efficiency). My reasons is that it is a unique field (primary key), and should be indexed, mainly for selection purposes, as we often Select from table where primary key = ...

I then applied a CLUSTERED index on PersonId. The reason was to group orders for a particular person physically, as the vast majority of work is getting orders for a person. So, select from mytable where personId = ...

I have been pulled up on this now. I have been told that we should put the clustered index on the primary key, and the normal index on the personId. That seems very strange to me. First off, why would you put a clustered index on a unique column? what is it clustering? Surely that's a waste of the clustered index? I'd have believed a normal index would be used on a unique column. Also, clustering the index would mean we can't cluster a different column (One per table, right?).

The reasoning for me being told I have made a mistake is that they believe putting a clustered index on the PersonId would make inserts slow. For the 5% gain in speed of a select, we would be getting a 95% degradation in speed on inserts and updates. Is that correct and valid?

They say that because we cluster the personId, SQL Server has to rearrange data when ever we insert or make a change to the PersonId.

So then I have asked, why would SQL have the concept of a CLUSTERED INDEX, if it's so slow? Is it as slow as they're saying? How should I have setup my indexes to achieve optimum performance? I'd have thought SELECT is used more than INSERT... but they say that we're having locking issues on INSERTS...

Hope someone can help me.

like image 673
Craig Avatar asked Sep 30 '11 03:09

Craig


People also ask

Which is better clustered or nonclustered index?

A clustered index may be the fastest for one SELECT statement but it may not necessarily be correct choice. SQL Server indices are b-trees. A non-clustered index just contains the indexed columns, with the leaf nodes of the b-tree being pointers to the approprate data page.

What is clustered and non-clustered in SQL?

A cluster index is a type of index that sorts the data rows in the table on their key values, whereas the Non-clustered index stores the data at one location and indices at another location.

What does non-clustered mean?

A nonclustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a nonclustered index does not consist of the data pages. Instead, the leaf nodes contain index rows.

When should we use clustered index?

In a Clustered table, a SQL Server clustered index is used to store the data rows sorted based on the clustered index key values. SQL Server allows us to create only one Clustered index per each table, as the data can be sorted in the table using one order criteria.


2 Answers

The distinction between a clustered vs. non-clustered index is that the clustered index determines the physical order of the rows in the database. In other words, applying the clustered index to PersonId means that the rows will be physically sorted by PersonId in the table, allowing an index search on this to go straight to the row (rather than a non-clustered index, which would direct you to the row's location, adding an extra step).

That said, it's unusual for the primary key not to be the clustered index, but not unheard of. The issue with your scenario is actually the opposite of what you're assuming: you want unique values in a clustered index, not duplicates. Because the clustered index determines the physical order of the row, if the index is on a non-unique column, then the server has to add a background value to rows who have a duplicate key value (in your case, any rows with the same PersonId) so that the combined value (key + background value) is unique.

The only thing I would suggest is not using a surrogate key (your CourtOrderId) column as the primary key, but instead use a compound primary key of the PersonId and some other uniquely-identifying column or set of columns. If that's not possible (or not practical), though, then put the clustered index on CourtOrderId.

like image 158
Adam Robinson Avatar answered Oct 15 '22 13:10

Adam Robinson


I am by no means a SQL Expert...so take this as a developer's view rather than a DBA view..

Inserts on clustered (physically ordered) indexes that aren't in sequential order cause extra work for inserts/updates. Also, if you have many inserts happening at once and they are all occurring in the same location, you end up with contention. Your specific performance varies based on your data and how you access it. The general rule of thumb is to build your clustered index on the most unique narrow value in your table (typically the PK)

I'm assuming your PersonId won't be changing, so Updates don't come into play here. But consider a snapshot of a few rows with PersonId of 1 2 3 3 4 5 6 7 8 8

Now insert 20 new rows for PersonId of 3. First, since this is not a unique key, the server adds some extra bytes to your value (behind the scenes) to make it unique (which also adds extra space) and then the location where these will reside has to be altered. Compare that to inserting an auto-incrementing PK where the inserts happen at the end. The non technical explanation would likely come down to this: there is less 'leaf-shuffling' work to do if it's naturally progressing higher values at the end of the table versus reworking location of the existing items at that location while inserting your items.

Now, if you are having issues with Inserts then you are likely inserting a bunch of the same (or similar) PersonId values at once which is causing this extra work in various places throughout the table and the fragmentation is killing you. The downside of switching to the PK being clustered in your case, is if you are having insert issues today on PersonIds that vary in value spread throughout the table, if you switch your clustered index to the PK and all of the inserts now happen in one location then your problem may actually get worse due to increased contention concentration. (On the flip side, if your inserts today are not spread out all over, but are all typically bunched in similar areas, then your problem will likely ease by switching your clustered index away from PersonId to your PK because you'll be minimizing the fragmentation.)

Your performance problems should be analyzed to your unique situation and take these types of answers as general guidelines only. Your best bet is to rely on a DBA that can validate exactly where your problems lie. It sounds like you have resource contention issues that may be beyond a simple index tweak. This could be a symptom of a much larger problem. (Likely design issues...otherwise resource limitations.)

In any case, good luck!

like image 38
Darian Miller Avatar answered Oct 15 '22 14:10

Darian Miller