I'm trying to generate a random time between 8:00 AM and 8:00 PM for each row that is selected from a data set, however, I always get the same random value for each row – I want it to be different for each row.
Table schema & data:
╔══════╦════════════════╗
║ ID ║ CREATED_DATE ║
╠══════╬════════════════╣
║ ID/1 ║ 26/04/2014 ║
║ ID/2 ║ 26/04/2014 ║
║ ID/3 ║ 26/04/2014 ║
║ ID/4 ║ 26/04/2014 ║
║ ID/5 ║ 26/04/2014 ║
╚══════╩════════════════╝
Сurrent SQL statement:
SELECT [ID]
, MyFunction.dbo.AddWorkDays(14, [CREATED_DATE]) AS [New Date]
, CONVERT(VARCHAR, DATEADD(MILLISECOND, CAST(43200000 * RAND() AS INT), CONVERT(TIME, '08:00')), 114) AS [New Time]
FROM [RandomTable]
Current results (same time for each row in the [New Time]
column):
╔══════╦════════════════╦════════════════╗
║ ID ║ New Date ║ New Time ║
╠══════╬════════════════╬════════════════╣
║ ID/1 ║ 10/05/2014 ║ 09:41:43 ║
║ ID/2 ║ 10/05/2014 ║ 09:41:43 ║
║ ID/3 ║ 10/05/2014 ║ 09:41:43 ║
║ ID/4 ║ 10/05/2014 ║ 09:41:43 ║
║ ID/5 ║ 10/05/2014 ║ 09:41:43 ║
╚══════╩════════════════╩════════════════╝
Desired results (different time for each row in the [New Time]
column):
╔══════╦════════════════╦════════════════╗
║ ID ║ New Date ║ New Time ║
╠══════╬════════════════╬════════════════╣
║ ID/1 ║ 10/05/2014 ║ 09:41:43 ║
║ ID/2 ║ 10/05/2014 ║ 15:05:23 ║
║ ID/3 ║ 10/05/2014 ║ 10:01:05 ║
║ ID/4 ║ 10/05/2014 ║ 19:32:45 ║
║ ID/5 ║ 10/05/2014 ║ 08:43:15 ║
╚══════╩════════════════╩════════════════╝
Any ideas on how to fix this? All of the above is just sample data – my real table has around 2800 records (not sure if that will make a difference to anyone's suggestions).
The issue OP had while using just rand()
is due to it's evaluation once per query.
From the documentation:
If seed is not specified, the SQL Server Database Engine assigns a seed value at random. For a specified seed value, the result returned is always the same.
Approach that is described below removes optimization and suppresses this behavior, so rand()
is evaluated once per row:
dateadd( second
, rand(cast(newid() as varbinary)) * 43200
, cast('08:00:00' as time) )
newid()
generates unique value of type uniqueidentifier
;cast
to be used as seed in rand([seed])
function to generate a pseudo-random float
value from 0 through 1, and as seed is always unique the returning value is unique too .SQLFiddle
You could alternatively use:
SELECT DATEADD(s, ABS(CHECKSUM(NewId()) % 43201), CAST('08:00:00' AS Time))
The ABS(CHECKSUM(NewId()) % 43201)
generates a random number between 0
and 43200
. See Discussion here.
SQL Fiddle
MS SQL Server 2008 Schema Setup:
Query 1:
SELECT DATEADD(s, ABS(CHECKSUM(NewId()) % 43201), CAST('08:00:00' AS Time)) AS [RandomTime]
FROM
( VALUES (1), (2), (3), (4), (5)
) Y(A)
CROSS JOIN
( VALUES (1), (2), (3), (4), (5)
) Z(A)
Results:
| RANDOMTIME |
|------------------|
| 16:51:58.0000000 |
| 10:42:44.0000000 |
| 14:01:38.0000000 |
| 13:33:51.0000000 |
| 18:00:51.0000000 |
| 11:29:03.0000000 |
| 10:21:14.0000000 |
| 16:38:27.0000000 |
| 09:55:37.0000000 |
| 13:21:13.0000000 |
| 11:29:37.0000000 |
| 10:57:49.0000000 |
| 14:56:42.0000000 |
| 15:33:11.0000000 |
| 18:49:45.0000000 |
| 16:23:28.0000000 |
| 09:00:05.0000000 |
| 09:20:01.0000000 |
| 11:26:23.0000000 |
| 15:26:23.0000000 |
| 10:38:44.0000000 |
| 11:46:30.0000000 |
| 16:00:59.0000000 |
| 09:29:18.0000000 |
| 09:09:19.0000000 |
The question states:
Now factor in the following points:
There is some ambiguity in the following areas:
Given the information above, there are a few ways to interpret the request:
RAND(CAST(NEWID() AS VARBINARY)) * 43200
ABS(CHECKSUM(NewId()) % 43201)
CAST(43200000 * (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) as int)
Hence, I based my answer on the idea that:
If the situation requires unique times, that cannot be guaranteed with any method of generating truly random values. I really like the use of CRYPT_GEN_RANDOM
by @Vladimir Baranov, but it is nearly impossible to get a unique set of values generated:
DECLARE @Table TABLE (Col1 BIGINT NOT NULL UNIQUE);
INSERT INTO @Table (Col1)
SELECT CONVERT(BIGINT, CRYPT_GEN_RANDOM(4))
FROM [master].sys.objects so
CROSS JOIN [master].sys.objects so2
CROSS JOIN [master].sys.objects so3;
-- 753,571 rows
Increasing the random value to 8 bytes does seem to work:
DECLARE @Table TABLE (Col1 BIGINT NOT NULL UNIQUE);
INSERT INTO @Table (Col1)
SELECT CONVERT(BIGINT, CRYPT_GEN_RANDOM(8))
FROM [master].sys.objects so
CROSS JOIN [master].sys.objects so2
CROSS JOIN [master].sys.objects so3;
-- 753,571 rows
Of course, if we are generating down to the second, then there are only 86,400 of those. Reducing the scope seems to help as the following does occasionally work:
DECLARE @Table TABLE (Col1 BIGINT NOT NULL UNIQUE);
INSERT INTO @Table (Col1)
SELECT TOP (86400) CONVERT(BIGINT, CRYPT_GEN_RANDOM(4))
FROM [master].sys.objects so
CROSS JOIN [master].sys.objects so2
CROSS JOIN [master].sys.objects so3;
However, things get a bit trickier if the uniqueness needs per each day (which seems like a reasonable requirement of this type of project, as opposed to unique across all days). But a random number generator isn't going to know to reset at each new day.
If it is acceptable to merely have the appearance of being random, then we can guarantee uniqueness per each date without:
RAND()
, NEWID()
, or CRYPT_GEN_RANDOM()
The following solution uses the concept of Modular Multiplicative Inverses (MMI) which I learned about in this answer: generate seemingly random unique numeric ID in SQL Server . Of course, that question did not have a tightly-defined range of values like we have here with only 86,400 of them per day. So, I used a range of 86400 (as "Modulo") and tried a few "coprime" values (as "Integer") in an online calculator to get their MMIs:
I use ROW_NUMBER()
in a CTE, partitioned (i.e. grouped) by CREATED_DATE
as a means of assigning each second of the day a value.
But, while the values generated for seconds 0, 1, 2, ... and so on sequentially will appear random, across different days that particular second will map to the same value. So, the second CTE (named "WhichSecond") shifts the starting point for each date by converting the date to an INT (which converts dates to a sequential offset from 1900-01-01) and then multiply by 101.
DECLARE @Data TABLE
(
ID INT NOT NULL IDENTITY(1, 1),
CREATED_DATE DATE NOT NULL
);
INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05');
INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05');
INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05');
INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05');
INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05');
INSERT INTO @Data (CREATED_DATE) VALUES ('2015-03-15');
INSERT INTO @Data (CREATED_DATE) VALUES ('2016-10-22');
INSERT INTO @Data (CREATED_DATE) VALUES ('2015-03-15');
;WITH cte AS
(
SELECT tmp.ID,
CONVERT(DATETIME, tmp.CREATED_DATE) AS [CREATED_DATE],
ROW_NUMBER() OVER (PARTITION BY tmp.CREATED_DATE ORDER BY (SELECT NULL))
AS [RowNum]
FROM @Data tmp
), WhichSecond AS
(
SELECT cte.ID,
cte.CREATED_DATE,
((CONVERT(INT, cte.[CREATED_DATE]) - 29219) * 101) + cte.[RowNum]
AS [ThisSecond]
FROM cte
)
SELECT parts.*,
(parts.ThisSecond % 86400) AS [NormalizedSecond], -- wrap around to 0 when
-- value goes above 86,400
((parts.ThisSecond % 86400) * 39539) % 86400 AS [ActualSecond],
DATEADD(
SECOND,
(((parts.ThisSecond % 86400) * 39539) % 86400),
parts.CREATED_DATE
) AS [DateWithUniqueTime]
FROM WhichSecond parts
ORDER BY parts.ID;
Returns:
ID CREATED_DATE ThisSecond NormalizedSecond ActualSecond DateWithUniqueTime
1 2014-10-05 1282297 72697 11483 2014-10-05 03:11:23.000
2 2014-10-05 1282298 72698 51022 2014-10-05 14:10:22.000
3 2014-10-05 1282299 72699 4161 2014-10-05 01:09:21.000
4 2014-10-05 1282300 72700 43700 2014-10-05 12:08:20.000
5 2014-10-05 1282301 72701 83239 2014-10-05 23:07:19.000
6 2015-03-15 1298558 2558 52762 2015-03-15 14:39:22.000
7 2016-10-22 1357845 61845 83055 2016-10-22 23:04:15.000
8 2015-03-15 1298559 2559 5901 2015-03-15 01:38:21.000
If we want to only generate times between 8:00 AM and 8:00 PM, we only need to make a few minor adjustments:
28800
to the second parameter of the DATEADD
as an 8 hour offsetThe result will be a change to just one line (since the others are diagnostic):
-- second parameter of the DATEADD() call
28800 + (((parts.ThisSecond % 43200) * 39539) % 43200)
Another means of shifting each day in a less predictable fashion would be to make use of RAND()
by passing in the INT form of CREATED_DATE
in the "WhichSecond" CTE. This would give a stable offset per each date since RAND(x)
will return the same value y
for the same value of x
passed in, but will return a different value y
for a different value of x
passed in. Meaning:
RAND(1) = y1
RAND(2) = y2
RAND(3) = y3
RAND(2) = y2
The second time RAND(2)
was called, it still returned the same value of y2
that it returned the first time it was called.
Hence, the "WhichSecond" CTE could be:
(
SELECT cte.ID,
cte.CREATED_DATE,
(RAND(CONVERT(INT, cte.[CREATED_DATE])) * {some number}) + cte.[RowNum]
AS [ThisSecond]
FROM cte
)
There are several methods:
NEWID
function to provide a seed for RAND
. It should be used with caution, because there is no guarantee about distribution of NEWID values. One of the best methods to make it more or less uniformly distributed is via the CHECKSUM
: RAND(CHECKSUM(NEWID()))
. The good thing about this method is that NEWID function is available since SQL Server 2000.NEWID
use, say, MD5 of some column as a seed for RAND
: RAND(CHECKSUM(HASHBYTES('MD5', CAST(SomeID AS varbinary(4)))))
Or simply row number: RAND(CHECKSUM(HASHBYTES('MD5', CAST(ROW_NUMBER() OVER(ORDER BY ...) AS varbinary(4)))))
. This method is available since at least SQL Server 2005. The primary difference from NEWID
method is that you have full control over the random sequence. You can't control what NEWID
returns and you can't restart the random sequence from the same number again. If you supply same sets of, say, row numbers using PARTITION BY
you'll get same sets of random numbers. It may be useful in the cases when you need to use the same sequence of random numbers several times. It is possible to get the same random number for two different seeds. I tested it for row numbers from 1 to 1,000,000. MD5
of them are all different. CHECKSUM
of MD5
result in 122 collisions. RAND
of this CHECKSUM
result in 246 collisions. When tested with row numbers from 1 to 100,000 CHECKSUM
had 1 collision, RAND
had 3 collisions.Random
class, or RNGCryptoServiceProvider
class.CRYPT_GEN_RANDOM
.I will describe the last method in detail, because I think that it is a very good solution for SQL Server 2008 and above. CRYPT_GEN_RANDOM
is called for each row of the result set, as opposed to RAND
, which is called only once.
CRYPT_GEN_RANDOM (Transact-SQL)
Returns a cryptographic random number generated by the Crypto API (CAPI). The output is a hexadecimal number of the specified number of bytes.
Besides, CRYPT_GEN_RANDOM
should provide much better random values, than RAND
. Better in terms of distribution and crypto-strength. Example:
(CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5)
This generates 4 random bytes as varbinary
. We have to explicitly cast them to int
first. Then result is transformed into a float number between 0 and 1.
So, the original query would like this:
SELECT ID AS [ID]
, MyFunction.dbo.AddWorkDays(14, S.CREATED_DATE) AS [New Date]
, CONVERT(VARCHAR, DATEADD(MILLISECOND,
CAST(43200000 * (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) as int),
CONVERT(TIME, '08:00')), 114) AS [New Time]
FROM RandomTable
Here is a stand-alone example that is easy to copy-paste and try (I used the query from another answer by @Steve Ford):
SELECT DATEADD(millisecond,
CAST(43200000 * (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) as int),
CAST('08:00:00' AS Time)) AS [RandomTime]
FROM
( VALUES (1), (2), (3), (4), (5)
) Y(A)
CROSS JOIN
( VALUES (1), (2), (3), (4), (5)
) Z(A)
This is the result:
RandomTime
10:58:24.7200000
19:40:06.7220000
11:04:29.0530000
08:57:31.6130000
15:03:14.9470000
09:15:34.9380000
13:46:43.1250000
11:27:00.8940000
14:42:23.6100000
15:07:56.2120000
11:39:09.8830000
08:16:44.3960000
14:23:38.4820000
17:28:31.7440000
16:29:31.4320000
09:09:15.0210000
12:31:09.8370000
11:23:09.8430000
15:35:45.5480000
17:42:49.3390000
08:07:05.4930000
18:17:16.2980000
11:49:08.2010000
10:20:21.7620000
15:56:58.6110000
When I read the original question I didn't think that it is really necessary to ensure that all generated random numbers are unique.
I interpreted the word "different" in the question as a vague opposite to seeing the same number in each row of the result that you see when using a simple SELECT RAND()
.
I think that in many cases it doesn't matter if there are few colliding random numbers. In many cases it would actually be the correct behavior.
So, my understanding is that when there is a need of a sequence of unique random numbers, it is in a sense equivalent to the following task. We have a set of some values/rows, for example, a set of unique IDs or all 86400 seconds of a day or 2800 rows for a given day. We want to shuffle these values/rows. We want to rearrange these rows in a random order.
To shuffle the given set of rows we simply need to ORDER BY
random numbers (these random numbers may have reasonable amount of collisions here). Random numbers could be generated by any method. Something like this:
ROW_NUMBER() OVER ([optional PARTITION BY ...] ORDER BY CRYPT_GEN_RANDOM(4))
or literally
SELECT ...
FROM ...
ORDER BY CRYPT_GEN_RANDOM(4)
depending on where and how it is used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With