It appears that SQL Server does not automatically use a CHECKSUM/hash index unless the CHECKSUM column is explicitly included in the search arguments for the query. This is a problem because I do not control the applications that query the table, and I may not break their performance. Is there any way to get SQL Server to use a new CHECKSUM/hash index without modifying queries to include the new CHECKSUM/hash column? <h3>Repro script</h3> <pre class="prettyprint lang-sql prettyprint-override"><code>CREATE TABLE big_table ( id BIGINT IDENTITY CONSTRAINT pk_big_table PRIMARY KEY, wide_col VARCHAR(50), wide_col_checksum AS CHECKSUM(wide_col), other_col INT ) CREATE INDEX ix_checksum ON big_table (wide_col_checksum) </code></pre> Insert some test data: <pre class="prettyprint lang-sql prettyprint-override"><code>SET NOCOUNT ON DECLARE @count INT = 0 BEGIN TRANSACTION WHILE @count < 10000 BEGIN SET @count = @count + 1 INSERT INTO big_table (wide_col, other_col) VALUES (SUBSTRING(master.dbo.fn_varbintohexstr(CRYPT_GEN_RANDOM(25)), 3, 50), @count) IF @count % 1000 = 0 BEGIN COMMIT TRANSACTION BEGIN TRANSACTION END END COMMIT TRANSACTION INSERT INTO big_table (wide_col, other_col) VALUES ('ABCDEFGHIJKLMNOPQRSTUVWXYZ', 9999999) </code></pre> <hr> Legacy query. Causes Clustered Index Scan (BAD): <pre class="prettyprint lang-sql prettyprint-override"><code>SELECT * FROM big_table WHERE wide_col = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' </code></pre> <img src="https://i.imgur.com/Eu7nm.png" alt="Clustered Index Scan (BAD)"> <hr> Updated query. Causes NonClustered Index Seek (good): <pre class="prettyprint lang-sql prettyprint-override"><code>SELECT * FROM big_table WHERE wide_col = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' AND wide_col_checksum = CHECKSUM('ABCDEFGHIJKLMNOPQRSTUVWXYZ') </code></pre> <img src="https://i.imgur.com/RA2ON.png" alt="NonClustered Index Seek (good)"> <h3>Background</h3> My table is very large (many hundreds of millions of rows), has several indexes (~ 20), all of which are required. Some of the indexed columns are a little wide (~ 50 bytes) and have few duplicate values. The columns are only searched on equality. The table is inserted into constantly. Here is a table comparing "normal" indexes and CHECKSUM/hash indexes on the sample table above, both compressed and non-compressed. Data from freshly rebuilt indexes on tables with 1 million rows: <img src="https://i.imgur.com/75xNZ.png" alt="Hash indexes and compression"> Page compression alone is pretty ineffective on the sample data (real data should compress a bit better). The hash index achieves a 4X index size reduction. Page compression on the hash index achieves a 6X index size reduction. My aims with using hash indexes are: <ol> <li>Reduce the size of these indexes in memory, thereby allowing SQL Server to cache a greater fraction in RAM, thereby avoiding physical reads.</li> <li>Reduce index storage size.</li> <li>Reduce index I/O for INSERT operations.</li> </ol>

If your application queries: <pre class="prettyprint"><code>SELECT * FROM big_table WHERE wide_col = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' </code></pre> You need an index on <code>wide_col</code>, not on <code>wide_col_checksum</code>. SQL Server stores indexes as a B-tree. As @MartinSmith suggests, reducing the size of columns in an index does indeed decrease the memory and disk footprint.

How to get MS SQL Server to transparently use a CHECKSUM/hash index?

It appears that SQL Server does not automatically use a CHECKSUM/hash index unless the CHECKSUM column is explicitly included in the search arguments for the query. This is a problem because I do not control the applications that query the table, and I may not break their performance.

Is there any way to get SQL Server to use a new CHECKSUM/hash index without modifying queries to include the new CHECKSUM/hash column?

Repro script

CREATE TABLE big_table
(
    id BIGINT IDENTITY CONSTRAINT pk_big_table PRIMARY KEY,
    wide_col VARCHAR(50),
    wide_col_checksum AS CHECKSUM(wide_col),
    other_col INT
)

CREATE INDEX ix_checksum ON big_table (wide_col_checksum)

Insert some test data:

SET NOCOUNT ON
DECLARE @count INT = 0
BEGIN TRANSACTION
WHILE @count < 10000
BEGIN
    SET @count = @count + 1
    INSERT INTO big_table (wide_col, other_col) 
    VALUES (SUBSTRING(master.dbo.fn_varbintohexstr(CRYPT_GEN_RANDOM(25)), 3, 50), @count)
    IF @count % 1000 = 0
    BEGIN
        COMMIT TRANSACTION
        BEGIN TRANSACTION
    END
END
COMMIT TRANSACTION

INSERT INTO big_table (wide_col, other_col) 
VALUES ('ABCDEFGHIJKLMNOPQRSTUVWXYZ', 9999999)

Legacy query. Causes Clustered Index Scan (BAD):

SELECT * FROM big_table 
WHERE wide_col = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

Clustered Index Scan (BAD)

Updated query. Causes NonClustered Index Seek (good):

SELECT * FROM big_table 
WHERE wide_col = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
AND wide_col_checksum = CHECKSUM('ABCDEFGHIJKLMNOPQRSTUVWXYZ')

NonClustered Index Seek (good)

Background

My table is very large (many hundreds of millions of rows), has several indexes (~ 20), all of which are required. Some of the indexed columns are a little wide (~ 50 bytes) and have few duplicate values. The columns are only searched on equality. The table is inserted into constantly.

Here is a table comparing "normal" indexes and CHECKSUM/hash indexes on the sample table above, both compressed and non-compressed. Data from freshly rebuilt indexes on tables with 1 million rows:

Hash indexes and compression

Page compression alone is pretty ineffective on the sample data (real data should compress a bit better). The hash index achieves a 4X index size reduction. Page compression on the hash index achieves a 6X index size reduction.

My aims with using hash indexes are:

Reduce the size of these indexes in memory, thereby allowing SQL Server to cache a greater fraction in RAM, thereby avoiding physical reads.
Reduce index storage size.
Reduce index I/O for INSERT operations.

How checksum works in SQL Server?

CHECKSUM computes a hash value, called the checksum, over its argument list. Use this hash value to build hash indexes. A hash index will result if the CHECKSUM function has column arguments, and an index is built over the computed CHECKSUM value. This can be used for equality searches over the columns.

How do I generate a hash key in SQL Server?

First of all, we have to make sure that the field or column we have used to preserve password for store the hash code is of data type varbinary. Then, use the HashBytes function in the insert statement to generate the hash for the password and store it in the column.

How does a hash index work in SQL?

A hash index is a data structure that can be used to accelerate database queries. It works by converting input records into an array of buckets. Each bucket has the same number of records as all other buckets in the table.

How does a hash index work?

Basically, a hash index is an array of N buckets or slots, each one containing a pointer to a row. Hash indexes use a hash function F(K, N) in which given a key K and the number of buckets N , the function maps the key to the corresponding bucket of the hash index.

If your application queries:

SELECT * FROM big_table WHERE wide_col = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

You need an index on wide_col, not on wide_col_checksum.

SQL Server stores indexes as a B-tree. As @MartinSmith suggests, reducing the size of columns in an index does indeed decrease the memory and disk footprint.

SQL Server does not automatically start using the checksum/hash index. The query needs to be using the hashed column for sql server to consider using the index. So I do not see how you can achieve your objective which making changes to the queries. It is an interesting question however, could be a good feature request to SQL Server.

How to get MS SQL Server to transparently use a CHECKSUM/hash index?

Tags:

sql-server

indexing

sql-server-2008

hash

database-performance

Repro script

Background

deleto

People also ask

2 Answers

Andomar

sqlwithpanks

Recent Activity

Donate For Us

How to get MS SQL Server to transparently use a CHECKSUM/hash index?

Tags:

sql-server

indexing

sql-server-2008

hash

database-performance

Repro script

Background

deleto

People also ask

2 Answers

Andomar

sqlwithpanks

Related questions

Recent Activity

Donate For Us