I have a list of items. When I create the list each item has equal chance to be selected. But as an item is selected its chance goes down while the others chance goes up. If a new item is added during the process, it should have the highest chance to be selected with its chances falling off as it is selected. I am looking for a good algorithm that can accomplish this is C#.
Generalizaed idea: I have 5 items, over time all 5 items will be selected 20% of time time. I am trying to keep the selections a close to that 20% as possible, cutting down on outlyers. If one exists it will be selected more/less to bring it back in line.
Use a bucket weighted queue: Instead of using a list, divide your collection into buckets - each bucket has an associated frequency of retrieval. Items move from higher frequency buckets to lower frequency ones as they are selected.
An easy way to implement this is to assign each bucket a range of values, and generate a random number within the combined range to pick a bucket to choose from. You'll probably want to abstract this collection into some kind of class so that you don't expose consumers to the details.
Algorithm:
Initially, all items start in the same (top-level) bucket.
When you select an item, move it from the bucket it's in, to the next lower bucket. If necessary, create the next level bucket.
When a new item is added, add it to the top most (most frequently used) bucket.
To randomly select an item, first pick a bucket, then pick an item within the bucket. Move the selected item down into the next level bucket. Note, that moving an item down to a lower frequency bucket is optional - you can set some cutoff point.
When creating a new bucket, update the retrieval range associated with all buckets to give the new set the frequency distribution characteristic you want.
C# implementation (first cut) of a generic bucketed weighted queue:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Utility
{
public class BucketWeightedQueue<T>
{
private int m_MaxFallOff;
private readonly double m_FallOffRate;
private readonly List<List<T>> m_Buckets;
private readonly List<int> m_FallOffFactors;
private readonly Random m_Rand = new Random();
public BucketWeightedQueue(IEnumerable<T> items, double fallOffRate )
{
if( fallOffRate < 0 )
throw new ArgumentOutOfRangeException("fallOffRate");
m_MaxFallOff = 1;
m_FallOffRate = fallOffRate;
m_Buckets = new List<List<T>> { items.ToList() };
m_FallOffFactors = new List<int> { m_MaxFallOff };
}
public T GetRandomItem()
{
var bucketIndex = GetRandomBucketIndex();
var bucket = m_Buckets[bucketIndex];
var itemIndex = m_Rand.Next( bucket.Count );
var selectedItem = bucket[itemIndex];
ReduceItemProbability( bucketIndex, itemIndex );
return selectedItem;
}
private int GetRandomBucketIndex()
{
var randFallOffValue = m_Rand.Next( m_MaxFallOff );
for (int i = 0; i < m_FallOffFactors.Count; i++ )
if( m_FallOffFactors[i] <= randFallOffValue )
return i;
return m_FallOffFactors[0];
}
private void ReduceItemProbability( int bucketIndex, int itemIndex )
{
if( m_FallOffRate <= 0 )
return; // do nothing if there is no fall off rate...
var item = m_Buckets[bucketIndex][itemIndex];
m_Buckets[bucketIndex].RemoveAt( itemIndex );
if( bucketIndex <= m_Buckets.Count )
{
// create a new bucket...
m_Buckets.Add( new List<T>() );
m_MaxFallOff = (int)(m_FallOffFactors[bucketIndex] / m_FallOffRate);
m_FallOffFactors.Add( m_MaxFallOff );
}
var nextBucket = m_Buckets[bucketIndex + 1];
nextBucket.Add( item );
if (m_Buckets[bucketIndex].Count == 0) // drop empty buckets
m_Buckets.RemoveAt( bucketIndex );
}
}
}
Here we will engineer a random number generator that has a distribution that favors low values. You can use it to prefer items at the beginning of a list. To decrease the odds of something being selected, move that item down the list. You have a few options for how you want to move the item down the list. Lets review the random variable transformation first.
By applying the following function to a uniform random variable between 0 and 1:
index = Int(l*(1-r^(0.5)) # l=length, r=uniform random var between 0 and 1
You get a cool distribution that drastically reduces the odds of a larger index
p(0)=0.09751
p(1)=0.09246
p(2)=0.08769
p(3)=0.08211
p(4)=0.07636
p(5)=0.07325
p(6)=0.06772
p(7)=0.06309
p(8)=0.05813
p(9)=0.05274
p(10)=0.04808
p(11)=0.04205
p(12)=0.03691
p(13)=0.03268
p(14)=0.02708
p(15)=0.02292
p(16)=0.01727
p(17)=0.01211
p(18)=0.00736
p(19)=0.00249
Here is the distribution for a list of size 2
0.75139
0.24862
Size 3
0.55699
0.33306
0.10996
Size 4
0.43916
0.31018
0.18836
0.06231
Now lets discuss the two options for moving the items down the list. I tested two:
ToEnd - Move most recently selected item to end of list
Sort - Keep an associated array of the number of times each item has been selected and sort the list from least to most selected.
I created a simulation to pick from a list and examine the standard deviation of the count that each item was selected. The lower the standard deviation the better. For example, 1 simulation for a list of 10 items where 50 selections where made created the spread:
{"a"=>5, "b"=>5, "c"=>6, "d"=>5, "e"=>4, "f"=>4, "g"=>5, "h"=>5, "i"=>6, "j"=>5}
The Standard Devation for this simulation was
0.63
With the ability to run a simulation, I then computed some meta-statistics by running the simulation 500 times and providing the average standard deviation for each method: ToEnd and Sort. I expected for the standard deviation to be high for low # of picks, but in fact for the ToEnd algorithm the Standard Deviation increased with the number of picks. The sort method fixed this.
Testing ["a", "b", "c", "d", "e"]
-------------------------
Picks ToEnd (StdDev) Sort (StdDev)
5 0.59 0.57
10 0.76 0.68
15 0.93 0.74
20 1.04 0.74
25 1.20 0.73
30 1.28 0.73
35 1.34 0.74
40 1.50 0.75
45 1.53 0.75
45 1.56 0.77
80 2.04 0.79
125 2.52 0.74
180 3.11 0.77
245 3.53 0.79
320 4.05 0.75
405 4.52 0.76
500 5.00 0.78
Here are some test results for a larger set.
Testing ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]
-------------------------
Picks ToEnd (StdDev) Sort (StdDev)
10 0.68 0.65
20 0.87 0.77
30 1.04 0.80
40 1.18 0.82
50 1.30 0.85
60 1.43 0.84
70 1.57 0.87
80 1.65 0.88
90 1.73 0.87
90 1.71 0.87
160 2.30 0.89
250 2.83 0.88
360 3.44 0.88
490 3.89 0.88
640 4.54 0.87
810 5.12 0.88
1000 5.66 0.85
With a good test framework, I couldn't resist trying a different random number transformation. My assumption was if I take the cube root instead of square root of x that my standard deviation would decrease. It did in fact, but I was worried that this would decrease the randomness. Here you can observe a few simulations when the formula is changed to
index = Int(l*(1-r^(0.33)) # l=length, r=uniform random var between 0 and 1
Now inspect the actual picks. As I thought, its very weighted to the beginning of the list at first. If you want to weight this heavily, you should probably randomize your list before you start.
StdDev = 0.632455532033676
{"a"=>10, "b"=>10, "c"=>11, "d"=>9, "e"=>10}
a d e c b c e b a d b e c a d d e b a e e c c b d a d c b c e b a a d d b e a e a b c b d c a c e c
StdDev = 0.0
{"a"=>10, "b"=>10, "c"=>10, "d"=>10, "e"=>10}
b c d a a d b c b a d e c d e c b e b a e e d c c a b a d a e e b d b a e c c e b a c c d d d a b e
StdDev = 0.632455532033676
{"a"=>9, "b"=>10, "c"=>10, "d"=>10, "e"=>11}
b d a e b c a d c e e b a c a d d c b c e d a e b b a c d c d a e a e e b d c b e a b c b c d d e e
StdDev = 0.0
{"a"=>10, "b"=>10, "c"=>10, "d"=>10, "e"=>10}
a b e d c e a b d c b c c a d b e e b e a d d c e a d b b c c a a a b e d d e c c a b b e a c d e d
StdDev = 0.632455532033676
{"a"=>11, "b"=>10, "c"=>9, "d"=>10, "e"=>10}
b a d c d c a e b e a e b c d b c a a d e e d c d e c b a b b e d c d b c e a a a d b c e b e a d a
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With