I have a set of values, and an associated percentage for each:
a: 70% chance
b: 20% chance
c: 10% chance
I want to select a value (a, b, c) based on the percentage chance given.
how do I approach this?
my attempt so far looks like this:
r = random.random() if r <= .7: return a elif r <= .9: return b else: return c
I'm stuck coming up with an algorithm to handle this. How should I approach this so it can handle larger sets of values without just chaining together if-else flows.
(any explanation or answers in pseudo-code are fine. a python or C# implementation would be especially helpful)
To perform a weighted average calculation you multiply each value (percentage mark) by its corresponding weight and then add all the results together. You then divide this answer by the sum of the weights.
To calculate how much weight you need, divide the known population percentage by the percent in the sample. For this example: Known population females (51) / Sample Females (41) = 51/41 = 1.24. Known population males (49) / Sample males (59) = 49/59 = .
Here is a complete solution in C#:
public class ProportionValue<T> { public double Proportion { get; set; } public T Value { get; set; } } public static class ProportionValue { public static ProportionValue<T> Create<T>(double proportion, T value) { return new ProportionValue<T> { Proportion = proportion, Value = value }; } static Random random = new Random(); public static T ChooseByRandom<T>( this IEnumerable<ProportionValue<T>> collection) { var rnd = random.NextDouble(); foreach (var item in collection) { if (rnd < item.Proportion) return item.Value; rnd -= item.Proportion; } throw new InvalidOperationException( "The proportions in the collection do not add up to 1."); } }
Usage:
var list = new[] { ProportionValue.Create(0.7, "a"), ProportionValue.Create(0.2, "b"), ProportionValue.Create(0.1, "c") }; // Outputs "a" with probability 0.7, etc. Console.WriteLine(list.ChooseByRandom());
For Python:
>>> import random >>> dst = 70, 20, 10 >>> vls = 'a', 'b', 'c' >>> picks = [v for v, d in zip(vls, dst) for _ in range(d)] >>> for _ in range(12): print random.choice(picks), ... a c c b a a a a a a a a >>> for _ in range(12): print random.choice(picks), ... a c a c a b b b a a a a >>> for _ in range(12): print random.choice(picks), ... a a a a c c a c a a c a >>>
General idea: make a list where each item is repeated a number of times proportional to the probability it should have; use random.choice
to pick one at random (uniformly), this will match your required probability distribution. Can be a bit wasteful of memory if your probabilities are expressed in peculiar ways (e.g., 70, 20, 10
makes a 100-items list where 7, 2, 1
would make a list of just 10 items with exactly the same behavior), but you could divide all the counts in the probabilities list by their greatest common factor if you think that's likely to be a big deal in your specific application scenario.
Apart from memory consumption issues, this should be the fastest solution -- just one random number generation per required output result, and the fastest possible lookup from that random number, no comparisons &c. If your likely probabilities are very weird (e.g., floating point numbers that need to be matched to many, many significant digits), other approaches may be preferable;-).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With