Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clustered index - multi-part vs single-part index and effects of inserts/deletes

This question is about what happens with the reorganizing of data in a clustered index when an insert is done. I assume that it should be more expensive to do inserts on a table which has a clustered index than one that does not because reorganizing the data in a clustered index involves changing the physical layout of the data on the disk. I'm not sure how to phrase my question except through an example I came across at work.

Assume there is a table (Junk) and there are two queries that are done on the table, the first query searches by Name and the second query searches by Name and Something. As I'm working on the database I discovered that the table has been created with two indexes, one to support each query, like so:

--drop table Junk1
CREATE TABLE Junk1
(
    Name char(5),  
    Something char(5),
    WhoCares int
)

CREATE CLUSTERED INDEX IX_Name ON Junk1
(
    Name
)

CREATE NONCLUSTERED INDEX IX_Name_Something ON Junk1
(
    Name, Something
)

Now when I looked at the two indexes, it seems that IX_Name is redundant since IX_Name_Something can be used by any query that desires to search by Name. So I would eliminate IX_Name and make IX_Name_Something the clustered index instead:

--drop table Junk2
CREATE TABLE Junk2
(
    Name char(5),  
    Something char(5),
    WhoCares int
)

CREATE CLUSTERED INDEX IX_Name_Something ON Junk2
(
    Name, Something
)

Someone suggested that the first indexing scheme should be kept since it would result in more efficient inserts/deletes (assume that there is no need to worry about updates for Name and Something). Would that make sense? I think the second indexing method would be better since it means one less index needs to be maintained.

I would appreciate any insight into this specific example or directing me to more info on maintenance of clustered indexes.

like image 698
Anssssss Avatar asked May 27 '10 20:05

Anssssss


People also ask

How indexes affect insert update and delete performance?

How does indexing affects Insert, Update and Delete operations in a table? INSERTs are always slower with indexes because the database now has to write them into the table as well as into all indexes. The more indexes you have, the slower it becomes. So it's up to you to balance out performance on SELECTs and INSERTs.

Do indexes affect INSERTs?

The number of indexes on a table is the most dominant factor for insert performance. The more indexes a table has, the slower the execution becomes. The insert statement is the only operation that cannot directly benefit from indexing because it has no where clause. Adding a new row to a table involves several steps.

What is the main advantage of a clustered index over a non clustered index?

In Clustered index leaf nodes are actual data itself. In Non-Clustered index leaf nodes are not the actual data itself rather they only contains included columns. In Clustered index, Clustered key defines order of data within table. In Non-Clustered index, index key defines order of data within index.

Do indexes make deletes faster?

So having a lot of indexes can speed up select statements, but slow down inserts, updates, and deletes. Note: Updates and deletes with WHERE clauses can use indexes for scans, even if the indexed column is being updated.


1 Answers

Yes, inserting into the middle of an existing table (or its page) could be expensive when you have a less than optimal clustered index. Worst case would be a page split : half the rows on the page would have to be moved elsewhere, and indices (including non-clustered indices on that table) need to be updated.

You can alleviate that problem by using the right clustered index - one that ideally is:

  • narrow (only a single field, as small as possible)
  • static (never changes)
  • unique (so that SQL Server doesn't need to add 4-byte uniqueifiers to your rows)
  • ever-increasing (like an INT IDENTITY)

You want a narrow key (ideally a single INT) since each and every entry in each and every non-clustered index will also contain the clustering key(s) - you don't want to put lots of columns in your clustering key, nor do you want to put things like VARCHAR(200) there!

With an ever increasing clustered index, you will never see the case of a page split. The only fragmentation you could encounter is from deletes ("swiss cheese" problem).

Check out Kimberly Tripp's excellet blog posts on indexing - most notably:

  • GUIDs as PRIMARY KEYs and/or the clustering key
  • The Clustered Index Debate Continues... - this one actually shows that a good clustered index will speed up all operations - including inserts, delete etc., compared to a heap with no clustered index!
  • Ever-increasing clustering key - the Clustered Index Debate..........again!

Assume there is a table (Junk) and there are two queries that are done on the table, the first query searches by Name and the second query searches by Name and Something. As I'm working on the database I discovered that the table has been created with two indexes, one to support each query, like so:

That's definitely not necessary - if you have one index on (Name, Something), that index can also and just as well be used if you search and restrict on just WHERE Name = abc - having a separate index with just the Name column is totally not needed and only wastes space (and costs time to be kept up to date).

So basically, you only need a single index on (Name, Something), and I would agree with you - if you have no other indices on this table, then you should be able to make this the clustered key. Since that key won't be ever-increasing and could possibly change, too (right?), this might not be such a great idea.

The other option would be to introduce a surrogate ID INT IDENTITY and cluster on that - with two benefits:

  • it's all a good clustered key should be, including ever-increasing -> you'll never have any issues with page splits and performance for INSERT operations
  • you still get all the benefits of having a clustering key (see Kim Tripps' blog posts - clustered tables are almost always preferable to heaps)
like image 144
marc_s Avatar answered Oct 19 '22 23:10

marc_s