Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select x random elements from a weighted list in C# (without replacement)

Update: my problem has been solved, I updated the code source in my question to match with Jason's answer. Note that rikitikitik answer is solving the issue of picking cards from a sample with replacement.

I want to select x random elements from a weighted list. The sampling is without replacement. I found this answer: https://stackoverflow.com/a/2149533/57369 with an implementation in Python. I implemented it in C# and tested it. But the results (as described below) were not matching what I expected. I've no knowledge of Python so I'm quite sure I made a mistake while porting the code to C# but I can't see where as the code in Pythong was really well documented.

I picked one card 10000 times and this is the results I obtained (the result is consistent accross executions):

Card 1: 18.25 % (10.00 % expected)
Card 2: 26.85 % (30.00 % expected)
Card 3: 46.22 % (50.00 % expected)
Card 4: 8.68 % (10.00 % expected)

As you can see Card 1 and Card 4 have both a weigth of 1 but Card 1 is awlays picked way more often than card 4 (even if I pick 2 or 3 cards).

Test data:

var cards = new List<Card>
{
    new Card { Id = 1, AttributionRate = 1 }, // 10 %
    new Card { Id = 2, AttributionRate = 3 }, // 30 %
    new Card { Id = 3, AttributionRate = 5 }, // 50 %
    new Card { Id = 4, AttributionRate = 1 }, // 10 %
};

Here is my implementation in C#

public class CardAttributor : ICardsAttributor
{
    private static Random random = new Random();

    private List<Node> GenerateHeap(List<Card> cards)
    {
        List<Node> nodes = new List<Node>();
        nodes.Add(null);

        foreach (Card card in cards)
        {
            nodes.Add(new Node(card.AttributionRate, card, card.AttributionRate));
        }

        for (int i = nodes.Count - 1; i > 1; i--)
        {
            nodes[i>>1].TotalWeight += nodes[i].TotalWeight;
        }

        return nodes;
    }

    private Card PopFromHeap(List<Node> heap)
    {
        Card card = null;

        int gas = random.Next(heap[1].TotalWeight);
        int i = 1;

        while (gas >= heap[i].Weight)
        {
            gas -= heap[i].Weight;
            i <<= 1;

            if (gas >= heap[i].TotalWeight)
            {
                gas -= heap[i].TotalWeight;
                i += 1;
            }
        }

        int weight = heap[i].Weight;
        card = heap[i].Value;

        heap[i].Weight = 0;

        while (i > 0)
        {
            heap[i].TotalWeight -= weight;
            i >>= 1;
        }

        return card;
    }

    public List<Card> PickMultipleCards(List<Card> cards, int cardsToPickCount)
    {
        List<Card> pickedCards = new List<Card>();

        List<Node> heap = GenerateHeap(cards);

        for (int i = 0; i < cardsToPickCount; i++)
        {
            pickedCards.Add(PopFromHeap(heap));
        }

        return pickedCards;
    }
}

class Node
{
    public int Weight { get; set; }
    public Card Value { get; set; }
    public int TotalWeight { get; set; }

    public Node(int weight, Card value, int totalWeight)
    {
        Weight = weight;
        Value = value;
        TotalWeight = totalWeight;
    }
}

public class Card
{
    public int Id { get; set; }
    public int AttributionRate { get; set; }
}
like image 389
Gabriel Avatar asked Aug 02 '12 10:08

Gabriel


People also ask

How do you do weighted random selection?

For random selection with particular weights, the following technique can then be used: Generate a random number between 0 and s u m ( w ) − 1 sum(w)-1 sum(w)−1. Find the smallest index that corresponds to the prefix sum greater than the randomly chosen number.

What is weighted random sampling?

In weighted random sampling (WRS) the items are weighted and the probability of each item to be selected is determined by its relative weight.

What is weighted choice?

Weighted random choices mean selecting random elements from a list or an array by the probability of that element. We can assign a probability to each element and according to that element(s) will be selected.


2 Answers

There are two minor bugs in the program. First, the range of the random number should be exactly equal to the total weight of all the items:

int gas = random.Next(heap[1].TotalWeight);

Second, change both places where it says gas > to say gas >=.

(The original Python code is OK because gas is a floating-point number, so the difference between > and >= is negligible. That code was written to accept either integer or floating-point weights.)

Update: OK, you made the recommended changes in your code. I think that code is correct now!

like image 189
Jason Orendorff Avatar answered Oct 29 '22 02:10

Jason Orendorff


As some people have mentioned in the comments, create a list of the cards in the exact proportion that you want:

var deck = new List<Card>();

cards.ForEach(c => 
{
    for(int i = 0; i < c.AttributionRate; i++)
    {
         deck.Add(c);
    }
}

Shuffle:

deck = deck.OrderBy(c => Guid.NewGuid()).ToList();

And pick x cards:

var hand = deck.Take(x)

Of course this only works if AttributionRate is an int. Otherwise, you would have to tinker with the deck generation a bit.

I get the following results for 10,000 runs taking 5 at a time:

Card 1: 9.932% 
Card 2: 30.15% 
Card 3: 49.854% 
Card 4: 10.064% 

Another result:

Card 1: 10.024%
Card 2: 30.034%
Card 3: 50.034% 
Card 4: 9.908% 

EDIT:

I've braved the bitwise operations and I have taken a look at your code. After adding a generous amount of barbecue sauce on my fried brain, I noticed a few things:

First, Random.Next(min,max) will include min in the random pool, but not max. This is the reason for the higher than expected probability for Card 1.

After doing that change, I implemented your code and it appears to be working when you draw 1 card.

Card 1: 10.4%  
Card 2: 32.2% 
Card 3: 48.4% 
Card 4: 9.0% 

Card 1: 7.5%
Card 2: 28.1%
Card 3: 50.0% 
Card 4: 14.4% 

HOWEVER, your code will not work when you draw more than 1 card because of this statement:

heap[i].Weight = 0;

That line, and the recalculation loop after that, essentially removes all instances of the drawn card from the heap. If you happen to draw four cards, then the percentage becomes 25% for all cards since you're basically drawing all 4 cards. The algorithm, as it is, is not completely applicable to your case.

I suspect you would have to recreate the heap every time you draw a card, but I doubt it would still perform as well. If I were to work on this though, I would just generate 4 distinct random numbers from 1 to heap[1].TotalWeight and get the 4 corresponding cards from there, although the random number generation in this case might become unpredictable (rerolling) and thus inefficient.

like image 24
rikitikitik Avatar answered Oct 29 '22 01:10

rikitikitik