Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TSQL CHECKSUM conundrum

SELECT BINARY_CHECKSUM('Clifton House, Thornaby Place, Teesdale South, Stockton-On-Tees, Cleveland, TS17 6SD')
SELECT BINARY_CHECKSUM('Clifton House, Teesdale South, Thornaby Place, Stockton-On-Tees, Cleveland, TS17 6SD')

SELECT BINARY_CHECKSUM('Glenfield Hospital, Groby Road, , Leicester, Leicestershire, LE3 9DZ')
SELECT BINARY_CHECKSUM('Glenfield Hospital, Groby Road, , Leicester, Leicestershire, LE3 9EJ')

Have a look at the above. The 2 pairs of addresses will generate the same checksum value even though there are differences in the text. It is my understanding that, whilst you cannot guarantee that CHECKSUM and BINARY_CHECKSUM will be different for any random content that they should be good for determining relatively small changes in a given row.

Interestingly these pairs of values demonstrate precisely the opposite. They are generating equal checksum values for very similar data values. These are in fact the only duplicate checksum values in a largish (680,000 record) table of addresses.

I am a little concerned that I have misunderstood the value of checksum in generating UPDATEs? Do I have to resort to a brute force field by field comparison to be absolutely certain of picking up a change in a row of data?

The original data for these examples was in 6 separate columns. I have reduced the code sample to a minimal state for clarity.

like image 245
Andrew Wiles Avatar asked Oct 06 '22 09:10

Andrew Wiles


2 Answers

Checksums are never completely foolproof. For a totally-always-correct-no-matter-what-solution brute force is the way. There are some less intense methods that can work though.

A checksum is pretty much a really simple hash. Try using hashbytes instead. then you can make use of a more chaotic algorithm like md5.

eg:

SELECT HASHBYTES('MD5', 'Glenfield Hospital, Groby Road, , Leicester, Leicestershire, LE3 9DZ')
like image 184
Sheena Avatar answered Oct 10 '22 03:10

Sheena


try using HASHBYTES() with MD5 or SHA1 instead of BINARY_CHECKSUM. read the help for HASHBYTES() first...

like image 38
Z.D. Avatar answered Oct 10 '22 03:10

Z.D.