SELECT BINARY_CHECKSUM('Clifton House, Thornaby Place, Teesdale South, Stockton-On-Tees, Cleveland, TS17 6SD')
SELECT BINARY_CHECKSUM('Clifton House, Teesdale South, Thornaby Place, Stockton-On-Tees, Cleveland, TS17 6SD')
SELECT BINARY_CHECKSUM('Glenfield Hospital, Groby Road, , Leicester, Leicestershire, LE3 9DZ')
SELECT BINARY_CHECKSUM('Glenfield Hospital, Groby Road, , Leicester, Leicestershire, LE3 9EJ')
Have a look at the above. The 2 pairs of addresses will generate the same checksum value even though there are differences in the text. It is my understanding that, whilst you cannot guarantee that CHECKSUM and BINARY_CHECKSUM will be different for any random content that they should be good for determining relatively small changes in a given row.
Interestingly these pairs of values demonstrate precisely the opposite. They are generating equal checksum values for very similar data values. These are in fact the only duplicate checksum values in a largish (680,000 record) table of addresses.
I am a little concerned that I have misunderstood the value of checksum in generating UPDATEs? Do I have to resort to a brute force field by field comparison to be absolutely certain of picking up a change in a row of data?
The original data for these examples was in 6 separate columns. I have reduced the code sample to a minimal state for clarity.
Checksums are never completely foolproof. For a totally-always-correct-no-matter-what-solution brute force is the way. There are some less intense methods that can work though.
A checksum is pretty much a really simple hash. Try using hashbytes instead. then you can make use of a more chaotic algorithm like md5.
eg:
SELECT HASHBYTES('MD5', 'Glenfield Hospital, Groby Road, , Leicester, Leicestershire, LE3 9DZ')
try using HASHBYTES() with MD5 or SHA1 instead of BINARY_CHECKSUM. read the help for HASHBYTES() first...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With