Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pick 4 unique items from a weighted list?

So I've got a list of weighted items, and I'd like to pick 4 non-duplicate items from this list.

Item     Weight
Apple     5
Banana    7
Cherry    12
...
Orange    8
Pineapple 50

What is the most efficient way to do this? My initial foray was to just reroll for the subsequent picks if an already picked item came up... but for a small list this can result in a ton of rerolls.

Edit for clarification: For the above example, and ignoring fruits D through N, there is a total weight of 82. So the chances of being picked first are: A ~6% B ~8.5% C ~14.6% O ~9.8% P ~61% Once an items is picked the probabilities would (should!) change.

like image 738
aslum Avatar asked Feb 24 '23 19:02

aslum


1 Answers

In your comment you say that unique means:

I don't want to pick the same item twice.

.. and that the weights determine a likelihood of being picked.

All you need to do to make sure that you don't pick duplicates, is simply remove the last picked item from the list before you pick the next one. Yes, this will change your weights slightly, but that is the correct statistical change to make if you do want unique results.


In addition, I'm not sure how you are using the weights to determine the candidates, but I came up with this algorithm that should do this with a minimal number of loops (and without the need to fill an array according to weights, which could result in extremely large arrays, requires int weights, etc.)

I've used JavaScript here, simply so it's easy to see the output in a browser without a server. It should be trivial to port to PHP since it's not doing anything complicated.

Constants

var FRUITS = [
    {name : "Apple", weight: 8 },
    {name : "Orange", weight: 4 },
    {name : "Banana", weight: 4 },
    {name : "Nectarine", weight: 3 },
    {name : "Kiwi", weight: 1 }
];

var PICKS = 3;

function getNewFruitsAvailable(fruits, removeFruit) {
    var newFruits = [];
    for (var idx in fruits) {
        if (fruits[idx].name != removeFruit) {
            newFruits.push(fruits[idx]);
        }
    }
    return newFruits;
}

Script

var results = [];
var candidateFruits = FRUITS;

for (var i=0; i < PICKS; i++) {
    // CALCULATE TOTAL WEIGHT OF AVAILABLE FRUITS
    var totalweight = 0;
    for (var idx in candidateFruits) {
        totalweight += candidateFruits[idx].weight;
    }
    console.log("Total weight: " + totalweight);

    var rand = Math.random();

    console.log("Random: " + rand);

    // ITERATE THROUGH FRUITS AND PICK THE ONE THAT MATCHES THE RANDOM
    var weightinc = 0;
    for (idx in candidateFruits) {
        // INCREMENT THE WEIGHT BY THE NEXT FRUIT'S WEIGHT
        var candidate = candidateFruits[idx];
        weightinc += candidate.weight;

        // IF rand IS BETWEEN LAST WEIGHT AND NEXT WEIGHT, PICK THIS FRUIT
        if (rand < weightinc/totalweight) {
            results.push(candidate.name);
            console.log("Pick: " + candidate.name);

            // GET NEXT SET OF FRUITS (REMOVING PICKED FRUIT)
            candidateFruits = getNewFruitsAvailable(candidateFruits, candidate.name);
            break;
        }
    }
    console.log("CandidateFruits: " + candidateFruits.length);
};

Output

for (var i=0; i < results.length; i++) {
    document.write(results[i] + "<br/>");
}

The basic strategy is to allocate each fruit a portion of the total range [0,1). In the first loop, you'd have this:

  • Apple — 8/20 = 0.0 up to 0.4
  • Orange — 4/20 = 0.4 up to 0.6
  • Banana — 4/20 = 0.6 up to 0.8
  • Nectarine — 3/20 = 0.8 up to 0.95
  • Kiwi — 8/20 = 0.95 up to 1.0

The script iterates over each item in the list, and progresses a weight counter. When it reaches the range that contains the first random, it picks that item, removes it from the list, then recalculates the ranges based on the new total weight and runs again.

like image 94
Nicole Avatar answered Mar 03 '23 09:03

Nicole