Our application has the following table definition:
CREATE TABLE [dbo].[Phrase] (
[PhraseId] UNIQUEIDENTIFIER DEFAULT (newid()) NOT NULL,
[English] NVARCHAR (250) NOT NULL,
[EnglishHash] AS (CONVERT([bigint],hashbytes('md5',[English])%(5)+(5))) PERSISTED,
PRIMARY KEY CLUSTERED ([PhraseId] ASC)
);
The intention was for the EnglishHash column to be a value of either 0,1,2,3,4,5,6,7,8, or 9
However it's only giving values: 1,2,3,4,5,6,7,8, or 9
Can anyone help to explain how I can modify this so it gives values 0-9 inclusive?
Note that I tried out the suggestion by Sandip. This gives me a distribution but over 11,000 records the distribution is not what I expected:
0 593
9 652
3 1324
6 1253
7 1293
1 1932
4 1325
5 1282
2 1295
8 635
Your results only allow you to have 9 distinct values because you are taking the modulus of 5. Look at the below results. if you were to continue taking the modulus, you'll see it cannot be any integer greater than 4 and no less than -4 (this is before adding the last +5). It's just going to start looping. Instead why don't you take modulus of 10 of the absolute value of the bigint value of the binary hash.
SELECT 0%5
, 1%5
, 2%5
, 3%5
, 4%5
, 5%5
, 6%5
SELECT 0%5
, -1%5
, -2%5
, -3%5
, -4%5
, -5%5
, -6%5
Try using this instead
ABS(CONVERT(bigint, HASHBYTES('md5',[English])))%10
Here's a nice example using the system error messages for some random text.
SELECT ABS(CONVERT(bigint, HASHBYTES('md5',[text])))%10 AS 'Result'
, COUNT(*) AS 'Distribution'
from sys.messages
GROUP BY ABS(CONVERT(bigint, HASHBYTES('md5',[text])))%10
ORDER BY ABS(CONVERT(bigint, HASHBYTES('md5',[text])))%10
Results:
Result Distribution
0 25326
1 25218
2 25115
3 25322
4 25167
5 25322
6 25278
7 25119
8 25139
9 25158
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With