Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practices and anti-patterns in creating indexes in SQL Server?

What are the things that you would consider when defining indexes, clustered and non-clustered, for SQL Server? Are there any anti-patterns that DB newbies should be aware of? Please explain the "Why" or provide references if possible.

like image 876
Buu Nguyen Avatar asked Dec 09 '08 02:12

Buu Nguyen


People also ask

How indexes improve performance in SQL Server?

SQL index is considered as one of the most important factors in the SQL Server performance tuning field. It helps in speeding up the queries by providing swift access to the requested data, called index seek operation, instead of scanning the whole table to retrieve a few records.

How do I speed up index rebuild in SQL Server?

By changing the number of processors SQL Server can use in parallel, in other words the maximum degree of parallelism (MAXDOP), we can improve index rebuild performance. This option is by default set to zero instance-wide on SQL Server, it does not mean use zero processors.


1 Answers

An index is basically a "cheat sheet". It allows the DBMS to find a particular value (or range of values) on disk without having to scan the whole table. Generally, you pay a little bit of penalty on INSERT / UPDATE / DELETE by having an index, but rarely so much that it's a bottleneck on its own. A good DBMS will only use indexes when they help query performance, so there aren't a lot of hugely negative anti-patterns here; it doesn't usually hurt you very much if you have extra indexes (unless you're talking about very highly transactional tables). That said, careful indexing across the board will help you make sure that the really important ones are there, and the best way to discover that is by profiling your application.

The key to understanding when and when not to use indexes is to get a grasp on what they're really doing under the covers. In a nutshell, you want them when the selectivity of the index is high (i.e. the number of different possible values is high compared to the size of the relation). So, for example, if you have a table with 10,000 rows, and you have a column called "color" on that table that's either "red" or "blue", it doesn't help much to have an index, because the DBMS will probably have to load most of the pages into memory anyway (assuming a random distribution). Conversely, an index on the primary key id of a table (which is nearly always automatically added) will make lookups in that table lightening fast - on the order of log(n) - because a very small number of nodes in the tree have to be examined to find the page on disk where the record resides.

Indexes in most modern database systems are implemented with a B+ tree, which is a very cool variant of B-Trees that's optimized for slow secondary storage (disks instead of memory). You can get a good introduction to their use and functionality from Database Systems: The Complete Book.

like image 54
Ian Varley Avatar answered Sep 19 '22 23:09

Ian Varley