according to this article
http://decipherinfosys.wordpress.com/2007/05/18/checksum-functions-in-sql-server-2005/
they say that
BINARY_CHECKSUM() returns the same value if the elements of two expressions have the same type and byte representation. So, “2Volvo Director 20″ and “3Volvo Director 30″ will yield the same value
my question is what is byte representation
?
and why
SELECT BINARY_CHECKSUM('2Volvo Director 20' )// -1356512636
SELECT BINARY_CHECKSUM('3Volvo Director 30' )// -1356512636
gives the same results ?
the byte of '2' is not as the byte of '3'
I don't know exactly what that article means by that phrasing, but from googling around I see that a SQL Server MVP has reverse-engineered the full algorithm (eg here) and others have noticed that the algorithm has weird problems around a 16-character cycle. Notice that in that sample, the changes occur at position 1 and position 17:
SELECT BINARY_CHECKSUM('2Volvo Director 20' )-- -1356512636
SELECT BINARY_CHECKSUM('3Volvo Director 30' )-- -1356512636
-- 12345678901234567
which are 16 apart - if the space is removed so the changes occur at positions 1 and 16, you get different checksums:
SELECT BINARY_CHECKSUM('2Volvo Director20' )-- 1257395465
SELECT BINARY_CHECKSUM('3Volvo Director30' )-- 1257395480
-- 12345678901234567
The internet consensus appears to be that the hash function offered by BINARY_CHECKSUM
is of low quality, and that you should use HASHBYTES
in preference.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With