Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance difference between Primary Key and Unique Clustered Index in SQL Server

We have 2 tables that have about 40M rows. the size of database is about 20GB, most are for these 2 tables. Everyday, We need delete some data, i.e. about 10M rows. So, we are using batch delete to keep the log file within certain size.

originally, there is no primary key for the table. But has unique, clustered index for each table. the delete takes for ever. i.e. it takes about 2-3 hours to delete 500K rows on a virtual machine. * before delete, the index was rebuilt.

now, we converted the unique, clustered index to primary key. it takes about 20-30 minutes to delete 2M rows.

I understand there is difference between primary key and unique clustered index, but why the performance is so different?

anyone has some insight?

thanks

like image 715
urlreader Avatar asked Feb 19 '23 19:02

urlreader


1 Answers

Rolling my 8-Ball: if you declared a non-clustered primary key (as it seems to suggest from your post) then on each batch you would very likely hit the index tipping point. Thus each batch would do a full scan of 40M rows to delete the batch size. Then, on the next batch, again a full scan. And so on until your 10M would be deleted. With a clustered key the batches should scan only the actual rows being deleted (of course I assume your batch delete criteria would actually use the clustered key...). As you see, there are many unknowns when one starts guessing...

But ultimately... you have a performance question and you should investigate it using the performance troubleshooting techniques. Capture the execution plans, the wait stats, the statistics io. Follow a methodology like Waits and Queues. Measure. Don't listen to guesses from someone on the internet that just rolled an 8-Ball...

like image 192
Remus Rusanu Avatar answered Mar 02 '23 00:03

Remus Rusanu