Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Constrained weighted choice

I have a weighted choice algorithm that works, but I'd like to improve it in two aspects (in order of importance):

  1. Guarantee that a minimum number from each possible choice is chosen.
  2. Reduce computational complexity from O(nm) to O(n) or O(m), where n is the requested number of randomly chosen items and m is types of items available.

Edit: The number of requested numbers is typically small (less than 100) for my purposes. As such, algorithms with complexity O(t) or O(t+n), where t is the total number of items, generally perform worse than O(nm) due to O(t) > O(m).

Simplified code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Security.Cryptography;

public class Program
{
    static void Main(string[] args)
    {
        // List of items with discrete availability
        // In this example there is a total of 244 discrete items and 3 types, 
        // but there could be millions of items and and hundreds of types. 
        List<Stock<string>> list = new List<Stock<string>>();
        list.Add(new Stock<string>("Apple", 200));
        list.Add(new Stock<string>("Coconut", 2));
        list.Add(new Stock<string>("Banana", 42));

        // Pick 10 random items
        // Chosen with equal weight across all types of items
        foreach (var item in Picker<string>.PickRandom(10, list))
        {
            // Do stuff with item
            Console.WriteLine(item);
        }
    }
}

// Can be thought of as a weighted choice
// where (Item Available) / (Sum of all Available) is the weight.
public class Stock<T>
{
    public Stock(T item, int available)
    {
        Item = item;
        Available = available;
    }
    public T Item { get; set; }
    public int Available { get; set; }
}

public static class Picker<T>
{
    // Randomly choose requested number of items from across all stock types
    // Currently O(nm), where n is requested number of items and m is types of stock
    // O(n) or O(m) would be nice, which I believe is possible but not sure how
    // O(1) would be awesome, but I don't believe it is possible
    public static IEnumerable<T> PickRandom(int requested, IEnumerable<Stock<T>> list)
    {
        // O(m) : to calcuate total items,
        // thus implicitly have per item weight -> (Item Available) / (Total Items)
        int sumAll = list.Sum(x => x.Available);

        // O(1)
        if (sumAll < requested)
            throw new ArgumentException("Requested amount must not exceed total available");

        // O(1)
        Random rng = new Random(Seed());

        // O(n) for the loop alone : O(nm) total
        for (int n = 0; n < requested; n++)
        {
            // O(1) to choose an item : uses implicit ordering
            int choice = rng.Next(1, sumAll);
            int current = 0;

            // O(m) to find the chosen item
            foreach (Stock<T> stock in list)
            {
                current += stock.Available;

                if (current >= choice)
                {
                    yield return stock.Item;

                    // O(1) to re-calculate weight once item is found
                    stock.Available -= 1;
                    sumAll--;

                    break;
                }
            }
        }
    }

    // Sufficiently random seed
    private static int Seed()
    {
        byte[] bytes = new byte[4];
        new RNGCryptoServiceProvider().GetBytes(bytes);
        return bytes[0] << 24 | bytes[1] << 16 | bytes[2] << 8 | bytes[3];
    }
}

The function PickRandom() uses yield return and IEnumerable, but is not required. I was just trying to be clever when I first wrote the function, so that it could iterate through anything (even say a enumerable from a LINQ to SQL query). Afterwards, I discovered that while the flexibility is nice, I never truly needed it.

My first thought on resolving point #1 (guarantee that a minimum number from each possible choice is chosen) would be to choose the required minimum from each type in an entirely non-random way, use my existing algorithm to pick the remaining unconstrained part, then shuffle the results together. This seemed the most natural and mimics how I would do something like this in real life, but I'm thinking it's not the most efficient way.

My second idea was to create a result array first, randomly choose indexes to fill in the required minimums first, then fill in the rest using my existing algorithm, but in all my attempts this ended up increasing the "big O" complexity or as a big mess of indexes being recorded everywhere. I still think this approach is possible, I just haven't been able to work it out yet.

Then decided to come here, since this problem seems like it could be abstracted to a fairly generic algorithm, but all the keywords I use to search usually point me to basic weighted random number generation (as opposed to choosing discrete items grouped by type with specific availability). And have not been able to find any that constrain the problem with a minimum choice per item type, while still remaining randomized. So I'm hoping either someone bright can see a simple efficient solution, or someone who's heard this problem before knows some better keywords than I and can point me in the right direction.

like image 227
Sybeus Avatar asked Aug 24 '11 06:08

Sybeus


1 Answers

Here's a rough idea; I'm sure it can be further improved:

  1. Assume that each available item has a unique ID in the range [0..sumAll[ where sumAll is the number of items available. So the first apple has ID 0, the last apple has ID 199, the first coconut has ID 200, and so on. Determining sumAll and the subrange of each type is O(m) where m is the number of types.

  2. Pick a random ID (with all IDs having the same weight). Repeat this until you have a set of 10 different IDs. This is O(n) where n is the number of items to pick.

  3. Determine the type of each picked ID using binary search. This is O(n log m).

  4. Remove the picked items from the available items. This is O(m).

To guarantee a minimum number of items picked for each type, picking these and removing them from the available items before step 1 sounds like a good idea. This is O(m).

like image 66
dtb Avatar answered Nov 01 '22 22:11

dtb