Random Weighted Choice in T-SQL

Tags:

How do you randomly select a table row in T-SQL based on an applied weight for all candidate rows?

For example, I have a set of rows in a table weighted at 50, 25, and 25 (which adds up to 100 but does not need to), and I want to select one of them randomly with a statistical outcome equivalent to the respective weight.

540

asked Sep 12 '08 07:09

Dane

3 Answers

Dane's answer includes a self joins in a way that introduces a square law. (n*n/2) rows after the join where there are n rows in the table.

What would be more ideal is to be able to just parse the table once.

DECLARE @id int, @weight_sum int, @weight_point int
DECLARE @table TABLE (id int, weight int)

INSERT INTO @table(id, weight) VALUES(1, 50)
INSERT INTO @table(id, weight) VALUES(2, 25)
INSERT INTO @table(id, weight) VALUES(3, 25)

SELECT @weight_sum = SUM(weight)
FROM @table

SELECT @weight_point = FLOOR(((@weight_sum - 1) * RAND() + 1))

SELECT
    @id = CASE WHEN @weight_point < 0 THEN @id ELSE [table].id END,
    @weight_point = @weight_point - [table].weight
FROM
    @table [table]
ORDER BY
    [table].Weight DESC

This will go through the table, setting @id to each record's id value while at the same time decrementing @weight point. Eventually, the @weight_point will go negative. This means that the SUM of all preceding weights is greater than the randomly chosen target value. This is the record we want, so from that point onwards we set @id to itself (ignoring any IDs in the table).

This runs through the table just once, but does have to run through the entire table even if the chosen value is the first record. Because the average position is half way through the table (and less if ordered by ascending weight) writing a loop could possibly be faster... (Especially if the weightings are in common groups):

DECLARE @id int, @weight_sum int, @weight_point int, @next_weight int, @row_count int
DECLARE @table TABLE (id int, weight int)

INSERT INTO @table(id, weight) VALUES(1, 50)
INSERT INTO @table(id, weight) VALUES(2, 25)
INSERT INTO @table(id, weight) VALUES(3, 25)

SELECT @weight_sum = SUM(weight)
FROM @table

SELECT @weight_point = ROUND(((@weight_sum - 1) * RAND() + 1), 0)

SELECT @next_weight = MAX(weight) FROM @table
SELECT @row_count   = COUNT(*)    FROM @table WHERE weight = @next_weight
SET @weight_point = @weight_point - (@next_weight * @row_count)

WHILE (@weight_point > 0)
BEGIN
    SELECT @next_weight = MAX(weight) FROM @table WHERE weight < @next_weight
    SELECT @row_count   = COUNT(*)    FROM @table WHERE weight = @next_weight
    SET @weight_point = @weight_point - (@next_weight * @row_count)
END

-- # Once the @weight_point is less than 0, we know that the randomly chosen record
-- # is in the group of records WHERE [table].weight = @next_weight

SELECT @row_count = FLOOR(((@row_count - 1) * RAND() + 1))

SELECT
    @id = CASE WHEN @row_count < 0 THEN @id ELSE [table].id END,
    @row_count = @row_count - 1
FROM
    @table [table]
WHERE
    [table].weight = @next_weight
ORDER BY
    [table].Weight DESC

125

answered Oct 17 '22 01:10

MatBailie

You simply need to sum the weights of all candidate rows, then choose a random point within that sum, then select the record that coordinates with that chosen point (each record is incrementally carrying an accumulating weight sum with it).

DECLARE @id int, @weight_sum int, @weight_point int
DECLARE @table TABLE (id int, weight int)

INSERT INTO @table(id, weight) VALUES(1, 50)
INSERT INTO @table(id, weight) VALUES(2, 25)
INSERT INTO @table(id, weight) VALUES(3, 25)

SELECT @weight_sum = SUM(weight)
FROM @table

SELECT @weight_point = ROUND(((@weight_sum - 1) * RAND() + 1), 0)

SELECT TOP 1 @id = t1.id
FROM @table t1, @table t2
WHERE t1.id >= t2.id
GROUP BY t1.id
HAVING SUM(t2.weight) >= @weight_point
ORDER BY t1.id

SELECT @id

answered Oct 17 '22 03:10

Dane

The "incrementally carrying a an accumlating[sic] weight sum" part is expensive if you have a lot of records. If you also already have a wide range of scores/weights (ie: the range is wide enough that most records weights are unique. 1-5 stars probably wouldn't cut it), you can do something like this to pick a weight value. I'm using VB.Net here to demonstrate, but this could easily be done in pure Sql as well:

Function PickScore()
    'Assume we have a database wrapper class instance called SQL and seeded a PRNG already
    'Get count of scores in database
    Dim ScoreCount As Double = SQL.ExecuteScalar("SELECT COUNT(score) FROM [MyTable]")
    ' You could also approximate this with just the number of records in the table, which might be faster.

    'Random number between 0 and 1 with ScoreCount possible values
    Dim rand As Double = Random.GetNext(ScoreCount) / ScoreCount

    'Use the equation y = 1 - x^3 to skew results in favor of higher scores
    ' For x between 0 and 1, y is also between 0 and 1 with a strong bias towards 1
    rand = 1 - (rand * rand * rand)

    'Now we need to map the (0,1] vector to [1,Maxscore].
    'Just find MaxScore and mutliply by rand
    Dim MaxScore As UInteger = SQL.ExecuteScalar("SELECT MAX(Score) FROM Songs")
    Return MaxScore * rand
End Function

Run this, and pick the record with the largest score less than the returned weight. If more than one record share that score, pick it at random. The advantages here are that you don't have to maintain any sums, and you can tweak the probability equation used to suit your tastes. But again, it works best with a larger distribution of scores.

answered Oct 17 '22 02:10

Joel Coehoorn

Related questions
                            
                                Finding node order in XML document in SQL Server
                            
                                How to select a row based on its row number?
                            
                                Splitting the string in sql server
                            
                                Get the last dates from multiple columns
                            
                                SQL: Is there a possibility to convert numbers (1,2,3,4...) to letters (A,B,C,D...)
                            
                                How can I determine if a Sequence exist in SQL Server 2012?
                            
                                How can I convert a float into int using the 'round' method in SQL Server?
                            
                                How to add an uniqueidentifier column and insert values
                            
                                SQL Server sp_msforeachtable usage to select only those tables which meet some condition
                            
                                Must ORDER BY items appear in the select list if the statement contains a UNION operator?
                            
                                SQL Find difference between previous and current row
                            
                                How to get Previous business day in a week with that of current Business Day using sql server
                            
                                What’s the easiest way to preview data from an image column?
                            
                                Calculating percentile rankings in MS SQL
                            
                                INSERT IF NOT EXISTS but return the identity either way
                            
                                Using GROUP BY with FIRST_VALUE and LAST_VALUE
                            
                                Creating a Primary Key on a temp table - When?
                            
                                ExecuteScalar vs ExecuteNonQuery when returning an identity value
                            
                                Driver.getConnection hangs using SQLServer driver and Java 1.6.0_29
                            
                                Sharing data between SQL databases

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Random Weighted Choice in T-SQL

Tags:

sql-server

tsql

random

Dane

People also ask

3 Answers

MatBailie

Dane

Joel Coehoorn

Recent Activity

Donate For Us