Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sql Server Legacy Database To Clustered index or not

We have a legacy database which is a sql server db (2005, and 2008).

All of the primary keys in the tables are UniqueIdentifiers.

The tables currently have no clustered index created on them and we are running into performance issues on tables with only 750k records. This is the first database i've worked on with unique identifiers as the sole primary key and I've never seen sql server be this slow with returning data.

I don't want to create a clustered index on the uniqueidentifier as they are not sequential and will therefore slow the apps down when it comes to inserting data.

We cannot remove the uniqueidentifier as that is used for remote site record identity management purposes.

I had thought about adding a big integer identity column to the tables and creating the clustered index on this column and including the unique identifier column.

i.e.

int identity - First column to maintain insert speeds unique identifier - To ensure the application keeps working as expected.

The goal is to improve the identity query and joined table query performance.

Q1: Will this improve the query performance of the db or will it slow it down?

Q2: Is there an alternative to this that I haven't listed?

Thanks Pete

Edit: The performance issues are on retrieving data quickly through select statements, especially if a few of the more "transactional / changing" tables are joined together.

Edit 2: The joins between tables are generally all between the primary key and foreign keys, for tables that have foreign keys they are included in the non-clustered index to provide a more covering index.

The tables all have no other values which would provide a good clustered index.

I'm leaning more towards adding an additional identity column on each of the high load tables and then including the current Guid PK column within the clustered index to provide the best query performance.

Edit 3: I would estimate that 80% of the queries are performed on primary and foreign keys alone through the data access mechanism. Generally our data model has lazy loaded objects which perform the query when accessed, these queries use the objects id and the PK column. We have a large amount of user driven data exclusion / inclusion queries which use the foreign key columns as a filter based on the criteria of for type X exclude the following id's. The remaining 20% is where clauses on Enum (int) or date range columns, very few text based queries are performed in the system.

Where possible I have already added covering indexes to cover the heaviest queries, but as yet i'm still dissapointed by the performance. As bluefooted says the data is being stored as a heap.

like image 683
Peter Avatar asked Aug 20 '10 21:08

Peter


People also ask

Should you always have a clustered index?

As a rule of thumb, every table should have a clustered index. Generally, but not always, the clustered index should be on a column that monotonically increases–such as an identity column, or some other column where the value is increasing–and is unique.

Should my index be clustered or nonclustered?

SELECT OperationsIf you want to select only the index value that is used to create and index, non-clustered indexes are faster. For example, if you have created an index on the “name” column and you want to select only the name, non-clustered indexes will quickly return the name.

What is the main advantage of a clustered index over a non-clustered index?

Cluster index offers faster data access, on the other hand, the Non-clustered index is slower.


1 Answers

If you don't have a clustered index on the table, it is being stored as a heap rather than a b-tree. Heap data access is absolutely atrocious in SQL Server so you definitely need to add a clustered index.

I agree with your analysis that the GUID column is a poor choice for clustering, especially since you don't have the ability to use NEWSEQUENTIALID(). You could create a new artificial integer key if you like, but if there is another column or combination of columns that would make sense as a clustered index, that is fine as well.

Do you have a field that is used frequently for range scans? Which columns are used for joins? Is there a combination of columns that also uniquely identifies the row aside from the GUID? Posting a sample of the data model would help us to suggest a good candidate for clustering.

like image 110
Pam Lahoud Avatar answered Oct 23 '22 09:10

Pam Lahoud