Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why checksum() returns the same value for different string

The following SQL and its result shows the different string got the same checksum result. Why?

select  str ,
        binary_checksum(str) binary_checksum,
        checksum(str) checksum,
        hashbytes('md5', str) md5
from    ( values ( '2Volvo Director 20'), ( '3Volvo Director 30'), ( '4Volvo Director 40') ) 
        t ( str ) 
str                binary_checksum checksum    md5
------------------ --------------- ----------- --------------------------------------------
2Volvo Director 20 -1356512636     -383039272  0xB9BD78BCF70FAC36AF14FFF589767278
3Volvo Director 30 -1356512636     -383039272  0xF039462F3D15B162FFCDB6125D290826
4Volvo Director 40 -1356512636     -383039272  0xFAF315CDA6E453CCC09838CFB129EE74
like image 920
ca9163d9 Avatar asked Jan 30 '17 22:01

ca9163d9


People also ask

Is SQL checksum unique?

The CHECKSUM value depends on the collation. The same value stored with a different collation will return a different CHECKSUM value. CHECKSUM () does not guarantee unique results.

What is the checksum function?

A checksum is a value that represents the number of bits in a transmission message and is used by IT professionals to detect high-level errors within data transmissions. Prior to transmission, every piece of data or file can be assigned a checksum value after running a cryptographic hash function.

What is row checksum?

CHECKSUM. Returns the checksum value computed over a row of a table, or over a list of expressions. CHECKSUM is intended for use in building hash indexes. BINARY_CHECKSUM. Returns the binary checksum value computed over a row of a table or over a list of expressions.

What is a checksum in SSIS?

The checksum command returns a number that represents the value of the data in the row. When you compare the checksum value of two rows, and the values match, this confirms the data on the rows match. If a change has occurred on one of the rows then the checksum value will change.


1 Answers

SQL CHECKSUM() and MD5 are Hash functions. Hashing is a one way algorithm which can take any number of chars/bytes and return a fixed number of chars/bytes.

It mean no matter if your input is 1 character or a complete book (War and Peace) you will get back the same length of response. So the input is infinite number of combinations meanwhile the output is finite. Based on that it is inevitable to get the same Hash for different values. It is called Hash collision. Good Hash algorithms try to mitigate this to make it hard to find these colliding values.

But enough theory about hashing. Here is exactly the answer to your question. What is the issue with CHECKSUM()?

like image 150
Major Avatar answered Oct 05 '22 05:10

Major