I have a table in SQL Server 2008 R2 that contain two field (WordHash, Word). This Hash
field generated in C# and I need regenerate hash code for Word
field in sql server.
But my problem is that generated MD5 hash in sql server and C# are different. I found below code to resolve this problem but still I have same problem.
SQL code:
CONVERT(NVARCHAR(32),HASHBYTES('MD5', 'some word'), 2)
After putting this code block to my query, I saw some wired result! This is my result:
My Query:
SELECT
[WordHash],
convert(nvarchar(32),HASHBYTES('MD5', 'Analytics'),2) AS TestHash,
convert(nvarchar(32),HASHBYTES('MD5', [Word]),2) AS SqlHash
FROM myTable
Result:
WordHash: A768CAA988605A2846599CF7E2D0C26A
TestHash: A768CAA988605A2846599CF7E2D0C26A
SqlHash F4AFA5FEF805F7F5163EC6402BAF61FF
Note that the 'Analytics'
is one of records data in database.
Why TestHash
& SqlHash
are different while they generated from same code!?
Yes, there can be collisions, but the chances of that happening are so incredibly small that I wouldn't worry about it unless you were literally tracking many billions of pieces of content. Show activity on this post. If you're really afraid of accidental collisions just do both MD5 and SHA1 hashes and compare them.
If MD5 hashes any arbitrary string into a 32-digit hex value, then according to the Pigeonhole Principle surely this can not be unique, as there are more unique arbitrary strings than there are unique 32-digit hex values.
Two files can have the same MD5 hash even if there are different. As the MD5 algorithm can take an infinity of input and give a limited number of output, it's not impossible, even if the probability of collision is very low. So, you have the short answer now, let's take a look at an example and how to avoid this issue.
Yes. Two different strings can absolutely give the same SHA256. If you know the hash it takes 2^256 evaluations to find another string that gives the same hash.
The issue is NVARCHAR
and VARCHAR
get hashed to different values. Both HASHBYTES('MD5', 'Analytics'),
and [WordHash]
are hashes of VARCHAR
values but [Word]
is a NVARCHAR
.
select HASHBYTES('MD5', 'Analytics'), 'varchar'
union
select HASHBYTES('MD5', N'Analytics'), 'nvarchar'
--outputs
------------------------------------- --------
0xA768CAA988605A2846599CF7E2D0C26A varchar
0xF4AFA5FEF805F7F5163EC6402BAF61FF nvarchar
To fix this you must either change [Word]
to be VARCHAR
or re-compute [WordHash]
using NVARCHAR
values.
Some useful further reading: Comparing SQL Server HASHBYTES function and .Net hashing
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With