Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the differences between CHECKSUM() and BINARY_CHECKSUM() and when/what are the appropriate usage scenarios?

Tags:

Again MSDN does not really explain in plain English the exact difference, or the information for when to choose one over the other.

CHECKSUM

Returns the checksum value computed over a row of a table, or over a list of expressions. CHECKSUM is intended for use in building hash indexes.

BINARY_CHECKSUM

Returns the binary checksum value computed over a row of a table or over a list of expressions. BINARY_CHECKSUM can be used to detect changes to a row of a table.

It does hint that binary checksum should be used to detect row changes, but not why.

like image 336
Andrew Burns Avatar asked Aug 06 '09 17:08

Andrew Burns


People also ask

What is the purpose of checksum in SQL?

CHECKSUM computes a hash value, called the checksum, over its argument list. Use this hash value to build hash indexes. A hash index will result if the CHECKSUM function has column arguments, and an index is built over the computed CHECKSUM value. This can be used for equality searches over the columns.

What is BINARY_CHECKSUM?

BINARY_CHECKSUM and CHECKSUM are similar functions. They can be used to compute a checksum value on a list of expressions, and the order of expressions affects the resultant value. The order of columns used for BINARY_CHECKSUM(*) is the order of columns specified in the table or view definition.

What do you understand by checksum?

A checksum is a value that represents the number of bits in a transmission message and is used by IT professionals to detect high-level errors within data transmissions. Prior to transmission, every piece of data or file can be assigned a checksum value after running a cryptographic hash function.

What is checksum in SQL Server backup?

Checksum when used in backup command, will test the page checksums that exist on the data file pages being backed up. If a bad page checksum is found, the backup process will stop (by default). Please note checksum is not replacement of dbcc checkdb, not all corruption can be find out using checksum.


2 Answers

Check out the following blog post that highlights the diferences.

http://decipherinfosys.wordpress.com/2007/05/18/checksum-functions-in-sql-server-2005/

Adding info from this link:

The key intent of the CHECKSUM functions is to build a hash index based on an expression or a column list. If say you use it to compute and store a column at the table level to denote the checksum over the columns that make a record unique in a table, then this can be helpful in determining whether a row has changed or not. This mechanism can then be used instead of joining with all the columns that make the record unique to see whether the record has been updated or not. SQL Server Books Online has a lot of examples on this piece of functionality.

A couple of things to watch out for when using these functions:

You need to make sure that the column(s) or expression order is the same between the two checksums that are being compared else the value would be different and will lead to issues.

We would not recommend using checksum(*) since the value that will get generated that way will be based on the column order of the table definition at run time which can easily change over a period of time. So, explicitly define the column listing.

Be careful when you include the datetime data-type columns since the granularity is 1/300th of a second and even a small variation will result into a different checksum value. So, if you have to use a datetime data-type column, then make sure that you get the exact date + hour/min. i.e. the level of granularity that you want.

There are three checksum functions available to you:

CHECKSUM: This was described above.

CHECKSUM_AGG: This returns the checksum of the values in a group and Null values are ignored in this case. This also works with the new analytic function’s OVER clause in SQL Server 2005.

BINARY_CHECKSUM: As the name states, this returns the binary checksum value computed over a row or a list of expressions. The difference between CHECKSUM and BINARY_CHECKSUM is in the value generated for the string data-types. An example of such a difference is the values generated for “DECIPHER” and “decipher” will be different in the case of a BINARY_CHECKSUM but will be the same for the CHECKSUM function (assuming that we have a case insensitive installation of the instance). Another difference is in the comparison of expressions. BINARY_CHECKSUM() returns the same value if the elements of two expressions have the same type and byte representation. So, “2Volvo Director 20” and “3Volvo Director 30” will yield the same value, however the CHECKSUM() function evaluates the type as well as compares the two strings and if they are equal, then only the same value is returned.

Example: STRING              BINARY_CHECKSUM_USAGE    CHECKSUM_USAGE ------------------- ----------------------    ----------- 2Volvo Director 20  -1356512636                -341465450 3Volvo Director 30  -1356512636                -341453853 4Volvo Director 40  -1356512636                -341455363 
like image 99
John Sansom Avatar answered Sep 30 '22 14:09

John Sansom


HASHBYTES with MD5 is 5 times slower than CHECKSUM, I've tested this on a table with over 1 million rows, and ran each test 5 times to get an average.

Interestingly CHECKSUM takes exactly the same time as BINARY_CHECKSUM.

Here is my post with the full results published: http://networkprogramming.wordpress.com/2011/01/14/binary_checksum-vs-hashbytes-in-sql/

like image 30
user250718 Avatar answered Sep 30 '22 14:09

user250718