Generating random values from uniform distribution with setting a seed in T-SQL

Tags:

I would like to generate a random value from the uniform distribution with mean=0 and a standard devation=1 for every row of a given data table in T-SQL. Additionally, I would like to set a seed in order to ensure reproducibility of analysis. Here are the ideas which did not work:

Using the function RAND() with a declared number does not fulfill this objective: the same random value is generated for every row of the data set.
Such a solution:

SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT)) AS [RandomNumber]

does not solve the problem either since it is not reproducible.

EDIT:

The performance does matter since my table has hundreds of millions of records.

961

asked Mar 01 '17 22:03

kaksat

4 Answers

The main question here IMHO is how you see 'repeatabililty'? Or asked differently: what 'drives' the randomness? I can envision a solution that sticks the same random number to every record for each run as long as the data doesn't change. However, what do you expect to happen if the data changes?

For the fun of it, I did the following tests on a (not very representative) test-table with 1 million of rows:

-- seed
SELECT Rand(0)

-- will show the same random number for EVERY record
SELECT Number, blah = Convert(varchar(100), NewID()), random = Rand()
  INTO #test
  FROM master.dbo.fn_int_list(1, 1000000)

CREATE UNIQUE CLUSTERED INDEX uq0_test ON #test (Number)

SET NOCOUNT ON

GO
DECLARE @start_time datetime = CURRENT_TIMESTAMP,
        @c_number int

-- update each record (one by one) and set the random number based on 'the next Rand()' value
-- => the order of the records drives the distribution of the Rand() value !

-- seed
SELECT @c_number = Rand(0) 

-- update 1 by 1 
DECLARE cursor_no_transaction CURSOR LOCAL STATIC
    FOR SELECT Number
          FROM #test
         ORDER BY Number
OPEN cursor_no_transaction 
FETCH NEXT FROM cursor_no_transaction INTO @c_number
WHILE @@FETCH_STATUS = 0
    BEGIN
        UPDATE #test 
           SET random = Rand()
         WHERE Number = @c_number

        FETCH NEXT FROM cursor_no_transaction INTO @c_number
    END
CLOSE cursor_no_transaction 
DEALLOCATE cursor_no_transaction 

PRINT 'Time needed (no transaction) : ' + Convert(nvarchar(100), DateDiff(ms, @start_time, CURRENT_TIMESTAMP)) + ' ms.'

SELECT _avg = AVG(random), _stdev = STDEV(random) FROM #test

GO

DECLARE @start_time datetime = CURRENT_TIMESTAMP,
        @c_number int

BEGIN TRANSACTION

-- update each record (one by one) and set the random number based on 'the next Rand()' value
-- => the order of the records drives the distribution of the Rand() value !

-- seed
SELECT @c_number = Rand(0) 

-- update 1 by 1 but all of it inside 1 single transaction
DECLARE cursor_single_transaction CURSOR LOCAL STATIC
    FOR SELECT Number
          FROM #test
         ORDER BY Number
OPEN cursor_single_transaction 
FETCH NEXT FROM cursor_single_transaction INTO @c_number
WHILE @@FETCH_STATUS = 0
    BEGIN
        UPDATE #test 
           SET random = Rand()
         WHERE Number = @c_number

        FETCH NEXT FROM cursor_single_transaction INTO @c_number
    END
CLOSE cursor_single_transaction 
DEALLOCATE cursor_single_transaction 

COMMIT TRANSACTION

PRINT 'Time needed (single transaction) : ' + Convert(nvarchar(100), DateDiff(ms, @start_time, CURRENT_TIMESTAMP)) + ' ms.'

SELECT _avg = AVG(random), _stdev = STDEV(random) FROM #test

GO

DECLARE @start_time datetime = CURRENT_TIMESTAMP

-- update each record (single operation), use the Number column to reseed the Rand() function for every record
UPDATE #test 
    SET random = Rand(Number)

PRINT 'Time needed Rand(Number) : ' + Convert(nvarchar(100), DateDiff(ms, @start_time, CURRENT_TIMESTAMP)) + ' ms.'

SELECT _avg = AVG(random), _stdev = STDEV(random) FROM #test

GO

DECLARE @start_time datetime = CURRENT_TIMESTAMP

-- update each record (single operation), use 'a bunch of fields' to reseed the Rand() function for every record
UPDATE #test 
    SET random = Rand(BINARY_CHECKSUM(Number, blah))

PRINT 'Time needed Rand(BINARY_CHECKSUM(Number, blah)) : ' + Convert(nvarchar(100), DateDiff(ms, @start_time, CURRENT_TIMESTAMP)) + ' ms.'

SELECT _avg = AVG(random), _stdev = STDEV(random) FROM #test

The outcomes are more or less as expected:

Time needed (no transaction) : 24570 ms.
_avg                   _stdev
---------------------- ----------------------
0.499630943538644      0.288686960086461

Time needed (single transaction) : 14813 ms.
_avg                   _stdev
---------------------- ----------------------
0.499630943538646      0.288686960086461

Time needed Rand(Number) : 1203 ms.
_avg                   _stdev
---------------------- ----------------------
0.499407423620328      0.291093824839539

Time needed Rand(BINARY_CHECKSUM(Number, blah)) : 1250 ms.
_avg                   _stdev
---------------------- ----------------------
0.499715398881586      0.288579510523627

All of these are 'repeatable', question is if 'repeatable' means what you want it to mean here. I've stuck to AVG() and STDEV() to get a coarse idea of the distribution, I'll leave it up to you to see if they actually fit the bill (and if not, how to improve upon it =)

1.2 seconds for 1 million rows doesn't sound too bad for 1 million of rows IMHO. That said, if your table contains extra columns it will take up more space and hence take more time!

Hope this gets you started...

129

answered Nov 05 '22 10:11

deroby

The Rand() function can be seeded at the start by passing it an integer seed value. If you do this once before generating any random numbers, the sequence of random numbers will be repeatable. Generating the values individually will ensure the Rand() function returns the numbers in sequence. The following will produce a uniform distribution of n pseudo-random numbers with mean=0 and standard deviation=1:

    DECLARE @Mean    FLOAT = 0.0; 
    DECLARE @stDev   FLOAT = 1.0; 
    DECLARE @n   INT = 100000;   -- count of random numbers to generate
    DECLARE @U   TABLE(x FLOAT); -- table of random numbers

    DECLARE @SEED    INT = 123456;    -- seed to ensure list is reproducible
    SELECT RAND(@Seed);

    SET NOCOUNT ON;
    BEGIN TRAN
    DECLARE @x INT = 0; -- counter
    WHILE @x < @n
      BEGIN
      INSERT INTO @U (x)
        SELECT @Mean + (2 * SQRT(3) * @stDev) * (RAND() - 0.5)
      SET @x = @x + 1;
      END;
    COMMIT

-- Check the results    
    SELECT * from @U;

    SELECT AVG([@U].x) AS mean,
        STDEV([@U].x) AS stDev
        FROM @U;

Instead of inserting into a temporary table in a while loop, you could loop through the records in your existing table using a cursor and do an Update on each record. As mentioned in the comments, performance could be an issue, but it meets the requirements "uniform distribution with mean=0 and a standard devation=1" and "reproducibility". The way the Rand() function works forces the "1 by 1" update.

Below is an alternative that will have much better performance (should run in under 2 seconds with 1 million rows) with a replacement for the Rand() function. This allows the records to be updated in a single UPDATE but relies on a unique numeric ID field in your table and updates a field called RandomNumber. The Rand() function is replaced by ( (ID * @SEED ) % 1000 ) / 1000 which can probably be improved upon.

DECLARE @Mean    FLOAT = 0.0; 
DECLARE @stDev   FLOAT = 1.0; 
DECLARE @SEED numeric(18,0)    = 1234567890.0;    -- seed to ensure list is reproducible

SET NOCOUNT ON;
BEGIN TRAN
UPDATE TestTable
   set Randomnumber = @Mean + (2 * SQRT(3) * @stDev) * (( (ID * @SEED ) % 1000 ) / 1000 - 0.5) 
COMMIT
-- Check the results    
SELECT AVG(RandomNumber) AS mean,
    STDEV(RandomNumber ) AS stDev
    FROM TestTable;

answered Nov 05 '22 11:11

chrisuae

DECLARE @userReportId BIGINT
SET @userReportId = FLOOR(RAND()*(10000000000000-1) + 1);

answered Nov 05 '22 10:11

Spythere

Repeatable random numbers are - quite probably - needed to repeat a situation, where tests went wrong in order to reproduce the circumstances of an exception.

The following suggestion will fill a physical table (add indexes!) with a position and a random number.

Use this list with a simple join to connect each of your rows with a random number.

Every call will bind the same random number to a given row.

Changing this can be done by re-positioning the randoms with a new random position (or you truncate-refill or drop-recreate the table).

This should be pretty fast...

CREATE TABLE dbo.MyRepeatableRandoms(CurrentPosition BIGINT,RandomNumber BIGINT);
GO
DECLARE @CountOfNumbers INT=5; --set a fitting max count here
WITH Tally AS
(
 SELECT TOP(@CountOfNumbers) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Nr 
 FROM master..spt_values 
      CROSS JOIN master..spt_values X 
      CROSS JOIN master..spt_values Y
)
INSERT INTO dbo.MyRepeatableRandoms
SELECT Nr,CAST(CAST(NEWID() AS VARBINARY(8)) AS BIGINT) FROM Tally;

--Use this list with a simple join to bind it to the rows of your table
SELECT * FROM dbo.MyRepeatableRandoms ORDER BY CurrentPosition;

--Re-Position the list
WITH UpdateableCTE AS
(
 SELECT ROW_NUMBER() OVER(ORDER BY A.posOrder) AS NewPos
       ,CurrentPosition
 FROM dbo.MyRepeatableRandoms
 CROSS APPLY(SELECT NEWID() AS posOrder) AS A
)
UPDATE UpdateableCTE SET CurrentPosition=NewPos;

--The same random numbers at new positions
SELECT * FROM MyRepeatableRandoms ORDER BY CurrentPosition;

GO
DROP TABLE dbo.MyRepeatableRandoms

The result

    RandomNumber
1   -1939965404062448822
2   2786711671511266125
3   -3236707863137400753
4   -6029509773149087675
5   7815987559555455297

After re-positioning

    RandomNumber
1   7815987559555455297
2   -1939965404062448822
3   2786711671511266125
4   -6029509773149087675
5   -3236707863137400753

answered Nov 05 '22 09:11

Shnugo

Related questions
                            
                                Memory Optimized Tables - Slower INSERT than SSD
                            
                                Data profiling Task - custom Profile Request
                            
                                T-Sql Remove Decimal Point From Money Data Type
                            
                                Performance of large EAV/open schema systems on SQL Server
                            
                                First Name Variations in a Database
                            
                                What's the difference between peer-to-peer and merge replication with SQL Server?
                            
                                Table variable poor performance on insert in SQL Server Stored Procedure
                            
                                Tools for Building an OCA (Occasionally Connected Application)
                            
                                Looking for solution for database versioning
                            
                                Replace with wildcard, in SQL
                            
                                How to add more OR searches with CONTAINS Brings Query to Crawl?
                            
                                Should I use 'Integrated Security=True' in a production environment?
                            
                                Passing 'current' row to a function in SQL Server
                            
                                Using stored procedures with Entity Framework 4 Code First CTP5
                            
                                Can R produce on-the-fly graphs for website?
                            
                                incorrect syntax near begin expecting EXTERNAL
                            
                                Sql server's Constant scan - clarification?
                            
                                Entity Framework 6 (code first) entity versioning and auditing
                            
                                Use Option (Recompile) in an Inline Table Valued Function
                            
                                SQL Server on Linux > Bulk Import error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Generating random values from uniform distribution with setting a seed in T-SQL

Tags:

sql-server

tsql

random

random-seed