Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

byte representation in BINARY_CHECKSUM()?

Tags:

sql-server

according to this article

http://decipherinfosys.wordpress.com/2007/05/18/checksum-functions-in-sql-server-2005/

they say that

BINARY_CHECKSUM() returns the same value if the elements of two expressions have the same type and byte representation. So, “2Volvo Director 20″ and “3Volvo Director 30″ will yield the same value

my question is what is byte representation?

and why

SELECT BINARY_CHECKSUM('2Volvo Director 20'  )// -1356512636
SELECT BINARY_CHECKSUM('3Volvo Director 30'  )// -1356512636

gives the same results ?

the byte of '2' is not as the byte of '3'

like image 226
Royi Namir Avatar asked Sep 22 '11 13:09

Royi Namir


1 Answers

I don't know exactly what that article means by that phrasing, but from googling around I see that a SQL Server MVP has reverse-engineered the full algorithm (eg here) and others have noticed that the algorithm has weird problems around a 16-character cycle. Notice that in that sample, the changes occur at position 1 and position 17:

SELECT BINARY_CHECKSUM('2Volvo Director 20'  )-- -1356512636
SELECT BINARY_CHECKSUM('3Volvo Director 30'  )-- -1356512636
--                      12345678901234567                  

which are 16 apart - if the space is removed so the changes occur at positions 1 and 16, you get different checksums:

SELECT BINARY_CHECKSUM('2Volvo Director20'  )-- 1257395465
SELECT BINARY_CHECKSUM('3Volvo Director30'  )-- 1257395480
--                      12345678901234567                  

The internet consensus appears to be that the hash function offered by BINARY_CHECKSUM is of low quality, and that you should use HASHBYTES in preference.

like image 68
AakashM Avatar answered Sep 30 '22 01:09

AakashM