Clustered index - multi-part vs single-part index and effects of inserts/deletes

Tags:

This question is about what happens with the reorganizing of data in a clustered index when an insert is done. I assume that it should be more expensive to do inserts on a table which has a clustered index than one that does not because reorganizing the data in a clustered index involves changing the physical layout of the data on the disk. I'm not sure how to phrase my question except through an example I came across at work.

Assume there is a table (Junk) and there are two queries that are done on the table, the first query searches by Name and the second query searches by Name and Something. As I'm working on the database I discovered that the table has been created with two indexes, one to support each query, like so:

--drop table Junk1
CREATE TABLE Junk1
(
    Name char(5),  
    Something char(5),
    WhoCares int
)

CREATE CLUSTERED INDEX IX_Name ON Junk1
(
    Name
)

CREATE NONCLUSTERED INDEX IX_Name_Something ON Junk1
(
    Name, Something
)

Now when I looked at the two indexes, it seems that IX_Name is redundant since IX_Name_Something can be used by any query that desires to search by Name. So I would eliminate IX_Name and make IX_Name_Something the clustered index instead:

--drop table Junk2
CREATE TABLE Junk2
(
    Name char(5),  
    Something char(5),
    WhoCares int
)

CREATE CLUSTERED INDEX IX_Name_Something ON Junk2
(
    Name, Something
)

Someone suggested that the first indexing scheme should be kept since it would result in more efficient inserts/deletes (assume that there is no need to worry about updates for Name and Something). Would that make sense? I think the second indexing method would be better since it means one less index needs to be maintained.

I would appreciate any insight into this specific example or directing me to more info on maintenance of clustered indexes.

698

asked May 27 '10 20:05

Anssssss

1 Answers

Yes, inserting into the middle of an existing table (or its page) could be expensive when you have a less than optimal clustered index. Worst case would be a page split : half the rows on the page would have to be moved elsewhere, and indices (including non-clustered indices on that table) need to be updated.

You can alleviate that problem by using the right clustered index - one that ideally is:

narrow (only a single field, as small as possible)
static (never changes)
unique (so that SQL Server doesn't need to add 4-byte uniqueifiers to your rows)
ever-increasing (like an INT IDENTITY)

You want a narrow key (ideally a single INT) since each and every entry in each and every non-clustered index will also contain the clustering key(s) - you don't want to put lots of columns in your clustering key, nor do you want to put things like VARCHAR(200) there!

With an ever increasing clustered index, you will never see the case of a page split. The only fragmentation you could encounter is from deletes ("swiss cheese" problem).

Check out Kimberly Tripp's excellet blog posts on indexing - most notably:

GUIDs as PRIMARY KEYs and/or the clustering key
The Clustered Index Debate Continues... - this one actually shows that a good clustered index will speed up all operations - including inserts, delete etc., compared to a heap with no clustered index!
Ever-increasing clustering key - the Clustered Index Debate..........again!

Assume there is a table (Junk) and there are two queries that are done on the table, the first query searches by Name and the second query searches by Name and Something. As I'm working on the database I discovered that the table has been created with two indexes, one to support each query, like so:

That's definitely not necessary - if you have one index on (Name, Something), that index can also and just as well be used if you search and restrict on just WHERE Name = abc - having a separate index with just the Name column is totally not needed and only wastes space (and costs time to be kept up to date).

So basically, you only need a single index on (Name, Something), and I would agree with you - if you have no other indices on this table, then you should be able to make this the clustered key. Since that key won't be ever-increasing and could possibly change, too (right?), this might not be such a great idea.

The other option would be to introduce a surrogate ID INT IDENTITY and cluster on that - with two benefits:

it's all a good clustered key should be, including ever-increasing -> you'll never have any issues with page splits and performance for INSERT operations
you still get all the benefits of having a clustering key (see Kim Tripps' blog posts - clustered tables are almost always preferable to heaps)

144

answered Oct 19 '22 23:10

marc_s

Related questions
                            
                                How to create a unique index containing multiple fields where one is a foreign key
                            
                                What are the consequences of not closing database connection after an error?
                            
                                Dates with no time or timezone component in Java/MySQL
                            
                                in SQL, or Django ORM, what's the conventional way to have an ordered one-to-many?
                            
                                Why does the SqlServer optimizer get so confused with parameters?
                            
                                Preferred database design method for assigning user roles? (Hats vs. Groups)
                            
                                SQL Server 2005, turn columns into rows
                            
                                Should CONTROL permission be given on a Stored Procedure in SQL Server 2005?
                            
                                is it better to put more logic in your ON clause or should it only have the minimum necessary?
                            
                                What are the best practices on formatting inline sql using ADO.NET in C#
                            
                                Using IsolationLevel.Snapshot but DB is still locking
                            
                                complex sql order by
                            
                                SQL Query, Selecting 5 most recent in each group
                            
                                Need help with the Merge statement
                            
                                Oracle bug? SELECT returns no dupes, INSERT from SELECT has duplicate rows
                            
                                SQL Count(*) on multiple tables
                            
                                Stored procedure executing another stored procedure
                            
                                correct way to create a pivot table in postgresql using CASE WHEN
                            
                                How to Deal with SET ANSI_NULLS ON or OFF?
                            
                                Verify two columns of two different tables match exactly

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Clustered index - multi-part vs single-part index and effects of inserts/deletes

Tags:

sql

sql-server

indexing

clustered-index

Anssssss

People also ask

1 Answers

marc_s

Recent Activity

Donate For Us