Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cluster the index on ever-increasing datetime column on logging table?

I'm not a DBA ("Good!", you'll be thinking in a moment.)

I have a table of logging data with these characteristics and usage patterns:

  • A datetime column for storing log timestamps whose value is ever-increasing and mostly (but only mostly) unique
  • Frequent-ish inserts (say, a dozen a minute), only at the end of the timestamp range (new data being logged)
  • Infrequent deletes, in bulk, from the beginning of the timestamp range (old data being cleared)
  • No updates at all
  • Frequent-ish selects using the timestamp column as the primary criterion, along with secondary criteria on other columns
  • Infrequent selects using other columns as the criteria (and not including the timestamp column)
  • A good amount of data, but nowhere near enough that I'm worried much about storage space

Additionally, there is currently a daily maintenance window during which I could do table optimization.

I frankly don't expect this table to challenge the server it's going to be on even if I mis-index it a bit, but nevertheless it seemed like a good opportunity to ask for some input on SQL Server clustered indexes.

I know that clustered indexes determine the storage of the actual table data (the data is stored in the leaf nodes of the index itself), and that non-clustered indexes are separate pointers into the data. So in query terms, a clustered index is going to be faster than a non-clustered index -- once we've found the index value, the data is right there. There are costs on insert and delete (and of course an update changing the clustered index column's value would be particularly costly).

But I read in this answer that deletes leave gaps that don't get cleaned up until/unless the index is rebuilt.

All of this suggests to me that I should:

  • Put a clustered index on the timestamp column with a 100% fill-factor
  • Put non-clustered indexes on any other column that may be used as a criterion in a query that doesn't also involve the clustered column (which may be any of them in my case)
  • Schedule the bulk deletes to occur during the daily maintenance interval
  • Schedule a rebuild of the clustered index to occur immediately after the bulk delete
  • Relax and get out more

Am I wildly off base there? Do I need to frequently rebuild the index like that to avoid lots of wasted space? Are there other obvious (to a DBA) things I should be doing?

Thanks in advance.

like image 237
T.J. Crowder Avatar asked Feb 07 '10 18:02

T.J. Crowder


Video Answer


1 Answers

Contrary to what a lot of people believe, having a good clustered index on a table can actually make operations like INSERTs faster - yes, faster!

Check out the seminal blog post The Clustered Index Debate Continues.... by Kimberly Tripp - the ultimate indexing queen.

She mentions (about in the middle of the article):

Inserts are faster in a clustered table (but only in the "right" clustered table) than compared to a heap. The primary problem here is that lookups in the IAM/PFS to determine the insert location in a heap are slower than in a clustered table (where insert location is known, defined by the clustered key). Inserts are faster when inserted into a table where order is defined (CL) and where that order is ever-increasing.

The crucial point is: only with the right clustered index will you be able to reap the benefits - when a clustered index is unique, narrow, stable and optimally ever-increasing. This is best served with an INT IDENTITY column.

Kimberly Tripp also has a great article on how to pick the best possible clustering key for your tables, and what criteria it should fulfil - see her post entitled Ever-increasing clustering key - the Clustered Index Debate..........again!

If you have such a column - e.g. a surrogate primary key - use that for your clustering key and you should see very nice performance on your table - even on lots of INSERTs.

like image 116
marc_s Avatar answered Oct 03 '22 05:10

marc_s