The following SQL and its result shows the different string got the same checksum
result. Why?
select str ,
binary_checksum(str) binary_checksum,
checksum(str) checksum,
hashbytes('md5', str) md5
from ( values ( '2Volvo Director 20'), ( '3Volvo Director 30'), ( '4Volvo Director 40') )
t ( str )
str binary_checksum checksum md5 ------------------ --------------- ----------- -------------------------------------------- 2Volvo Director 20 -1356512636 -383039272 0xB9BD78BCF70FAC36AF14FFF589767278 3Volvo Director 30 -1356512636 -383039272 0xF039462F3D15B162FFCDB6125D290826 4Volvo Director 40 -1356512636 -383039272 0xFAF315CDA6E453CCC09838CFB129EE74
The CHECKSUM value depends on the collation. The same value stored with a different collation will return a different CHECKSUM value. CHECKSUM () does not guarantee unique results.
A checksum is a value that represents the number of bits in a transmission message and is used by IT professionals to detect high-level errors within data transmissions. Prior to transmission, every piece of data or file can be assigned a checksum value after running a cryptographic hash function.
CHECKSUM. Returns the checksum value computed over a row of a table, or over a list of expressions. CHECKSUM is intended for use in building hash indexes. BINARY_CHECKSUM. Returns the binary checksum value computed over a row of a table or over a list of expressions.
The checksum command returns a number that represents the value of the data in the row. When you compare the checksum value of two rows, and the values match, this confirms the data on the rows match. If a change has occurred on one of the rows then the checksum value will change.
SQL CHECKSUM() and MD5 are Hash functions. Hashing is a one way algorithm which can take any number of chars/bytes and return a fixed number of chars/bytes.
It mean no matter if your input is 1 character or a complete book (War and Peace) you will get back the same length of response. So the input is infinite number of combinations meanwhile the output is finite. Based on that it is inevitable to get the same Hash for different values. It is called Hash collision. Good Hash algorithms try to mitigate this to make it hard to find these colliding values.
But enough theory about hashing. Here is exactly the answer to your question. What is the issue with CHECKSUM()?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With