Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server Index performance - long column

In SQL Server (2005+) I need to index a column (exact matches only) that is nvarchar(2000+). What is the most scalable, performant way to approach this?

In SQL Server (2005+), what would be the practical difference in indexing on a column with the following types:

  • nvarchar(2000)
  • char(40)
  • binary(16)

E.g. would a lookup against an indexed binary(16) column be measurably faster than a lookup against an indexed nvarchar(2000)? If so, how much?

Obviously smaller is always better in some regard, but I am not familiar enough with how SQL Server optimizes its indexes to know how it deals with length.

like image 521
Rex M Avatar asked Jul 06 '09 20:07

Rex M


People also ask

Which columns are not good for indexing?

Indexes should not be used on tables containing few records. Tables that have frequent, large batch updates or insert operations. Indexes should not be used on columns that contain a high number of NULL values. Indexes should not be used on the columns that are frequently manipulated.

How can I make indexes faster in SQL Server?

Apply SQL Server index key column best practices Columns with text, image, ntext, varchar(max), nvarchar(max) and varbinary(max) cannot be used in the index key columns. It is recommended to use an integer data type in the index key column. It has a low space requirement and works efficiently.

What will happen if we place index on all columns will it get faster or slower?

No, there is overhead in maintaining the indexes, so indexing all columns would slow down all of your insert, update and delete operations.

Do indexes slow down inserts?

The number of indexes on a table is the most dominant factor for insert performance. The more indexes a table has, the slower the execution becomes.


3 Answers

You're thinking about this from the wrong direction:

  • Do create indexes you need to meet performance goals
  • Do NOT create indexes you don't need

Whether a column is a binary(16) or nvarchar(2000) makes little difference there, because you don't just go add indexes willy nilly.

Don't let index choice dictate your column types. If you need to index an nvarchar(2000) consider a fulltext index or adding a hash value for the column and index that.


Based on your update, I would probably create either a checksum column or a computed column using the HashBytes() function and index that. Note that a checksum isn't the same as a cryptographic hash and so you are somewhat more likely have collisions, but you can also match the entire contents of the text and it will filter with the index first. HashBytes() is less likely to have collisions, but it is still possible and so you still need to compare the actual column. HashBytes is also more expensive to compute the hash for each query and each change.

like image 103
Joel Coehoorn Avatar answered Sep 22 '22 13:09

Joel Coehoorn


OF COURSE a binary(16) will be MUCH faster - just do the quickest of calculations:

  • a SQL Server page is always 8K
  • if you have 16 bytes per entry, you can store 500 entries on a page
  • with 4000 bytes per entry (nvarchar) you'll end up with 2 entries per page (worst case, if your NVARCHAR(2000) are fully populated)

If you have a table with 100'000 entries, you'll have to have 200 pages for the index with a binary(16) key, while you'll need 50'000 pages for the same index with nvarchar(2000)

Even just the added I/O to read and scan all those pages is going to kill any performance you might have had........

Marc

UPDATE:
For my usual indexes, I try to avoid compound indexes as much as I can - referencing them from other tables just gets rather messy (WHERE clauses with several equality comparisons).

Also, regularly check and maintain your indices - if you have more than 30% fragmentation, rebuild - if you have 5-30% fragmentation, reorganize. Check out an automatic, well tested DB Index maintenance script at http://sqlfool.com/2009/06/index-defrag-script-v30/

For the clustered key on a SQL Server table, try to avoid GUID's since they're random in nature and thus cause potentially massive index fragmentation and therefore hurt performance. Also, while not a hard requirement, try to make sure your clustered key is unique - if it's not, SQL Server will add a four-byte uniqueifier to it. Also, the clustered key gets added to each and every entry in each and every non-clustered index - so in the clustered key, it's extremely important to have a small, unique, stable (non-changing) column (optimally it's ever-increasing , that gives you the best characteristics and performance --> INT IDENTITY is perfect).

like image 23
marc_s Avatar answered Sep 25 '22 13:09

marc_s


You can have at most 900 bytes per index entry, so your nvarchar(2000) won't fly. The biggest difference will be index depth - the number of pages to traverse from the root to the leaf page. So, if you need to search, you can index on CHECKSUM, like this:

alter table recipe add text_checksum as checksum(recipe_text)
create index text_checksum_ind on recipe(text_checksum)

(example from here Indexes on Computed Columns: Speed Up Queries, Add Business Rules) which will not give you an exact match, only narrow down your search very well.

Of course, if you need to enforce uniqueness, you'll have to use triggers.

Another idea is to zip your nvarchar to a smaller binary value, and index on that, but can you guarantee that every value is always zipped to 900 bytes or less?

like image 27
A-K Avatar answered Sep 25 '22 13:09

A-K