Efficient algorithm to randomly select items with frequency

Question

Given an array of n word-frequency pairs:

[ (w₀, f₀), (w₁, f₁), ..., (w_n-1, f_n-1) ]

where w_i is a word, f_i is an integer frequencey, and the sum of the frequencies ∑f_i = m,

I want to use a pseudo-random number generator (pRNG) to select p words w_j₀, w_j₁, ..., w_{j_p-1} such that the probability of selecting any word is proportional to its frequency:

P(w_i = w_{j_k}) = P(i = j_k) = f_i / m

(Note, this is selection with replacement, so the same word could be chosen every time).

I've come up with three algorithms so far:

Create an array of size m, and populate it so the first f₀ entries are w₀, the next f₁ entries are w₁, and so on, so the last f_p-1 entries are w_p-1.
```
[ w₀, ..., w₀, w₁,..., w₁, ..., w_p-1, ..., w_p-1 ]
```
Then use the pRNG to select p indices in the range 0...m-1, and report the words stored at those indices.
This takes O(n + m + p) work, which isn't great, since m can be much much larger than n.
Step through the input array once, computing
```
m_i = ∑_h≤if_h = m_i-1 + f_i
```
and after computing m_i, use the pRNG to generate a number x_k in the range 0...m_i-1 for each k in 0...p-1 and select w_i for w_{j_k} (possibly replacing the current value of w_{j_k}) if x_k < f_i.
This requires O(n + np) work.
Compute m_i as in algorithm 2, and generate the following array on n word-frequency-partial-sum triples:
```
[ (w₀, f₀, m₀), (w₁, f₁, m₁), ..., (w_n-1, f_n-1, m_n-1) ]
```
and then, for each k in 0...p-1, use the pRNG to generate a number x_k in the range 0...m-1 then do binary search on the array of triples to find the i s.t. m_i-f_i ≤ x_k < m_i, and select w_i for w_{j_k}.
This requires O(n + p log n) work.

My question is: Is there a more efficient algorithm I can use for this, or are these as good as it gets?

seb · Accepted Answer

This sounds like roulette wheel selection, mainly used for the selection process in genetic/evolutionary algorithms.

Look at Roulette Selection in Genetic Algorithms

Guffa · Answer

You could create the target array, then loop through the words determining the probability that it should be picked, and replace the words in the array according to a random number.

For the first word the probability would be f₀/m₀ (where m_n=f₀+..+f_n), i.e. 100%, so all positions in the target array would be filled with w₀.

For the following words the probability falls, and when you reach the last word the target array is filled with randomly picked words accoding to the frequency.

Example code in C#:

public class WordFrequency {

    public string Word { get; private set; }
    public int Frequency { get; private set; }

    public WordFrequency(string word, int frequency) {
        Word = word;
        Frequency = frequency;
    }

}

WordFrequency[] words = new WordFrequency[] {
    new WordFrequency("Hero", 80),
    new WordFrequency("Monkey", 4),
    new WordFrequency("Shoe", 13),
    new WordFrequency("Highway", 3),
};

int p = 7;
string[] result = new string[p];
int sum = 0;
Random rnd = new Random();
foreach (WordFrequency wf in words) {
    sum += wf.Frequency;
    for (int i = 0; i < p; i++) {
        if (rnd.Next(sum) < wf.Frequency) {
            result[i] = wf.Word;
        }
    }
}

Efficient algorithm to randomly select items with frequency

Tags:

algorithm

big-o

random

rampion

2 Answers

seb

Guffa

Recent Activity

Donate For Us

Efficient algorithm to randomly select items with frequency

Tags:

algorithm

big-o

random

rampion

2 Answers

seb

Guffa

Related questions

Recent Activity

Donate For Us