With all the "distinct combination" and "Cartesian product" questions on SO, I'm sure there's a name and canonical solution for this one, but I'm not turning it up.
Update... Here's a potentially better example: Suppose a club has regular raffle events. Many items are raffled per event, and members buy tickets on a per-item basis. On raffle night, the raffle manager prints out batches of name cards, batch A, B, C and so on. As each item is raffled, he throws one of these pre-assembled batches into the hopper, mixes it up, and draws a name. After giving away the prize, the name goes back into the batch, which he reuses if any other item happens to have the same batch of contestants. Question: Is there a stateless algorithm that can assemble the batches of name cards, printing the minimum total number of cards? [If not, Chris Shain's HashSet<> example is the most efficient stateful alternative I'm aware of.]
Original question and examples: Consider the following lists of people, sandwiches and allergies (stored relationally; these data structures are just to keep the posting short, and aren't intrinsic to the question or solution):
var people = { "Pete", "Barb", "Debbie", "Frank", "Ralph", "Sally" };
var sandwiches = { "Peanut Butter", "Egg Salad", "Tuna Salad", "Oven Roasted Chicken", "Gluten-free Twigs" };
var allergies = {
{ "Pete", null },
{ "Barb", { "Peanut Butter" } },
{ "Debbie", { "Peanut Butter", "Egg Salad", "Tuna Salad" } },
{ "Frank", { "Egg Salad", "Tuna Salad" } },
{ "Ralph", { "Oven Roasted Chicken" } },
{ "Sally", { "Egg Salad", "Tuna Salad" } } };
To find the people who can eat a given sandwich, I can of course iterate through the sandwiches (outer) and people (inner) easily enough and check for allergies.
What I want, though, is to precalculate and publish the smallest list of non-allergic person sets that would cover all the sandwiches (people will obviously belong to more than one set), with no more than one set of people for any sandwich, and maximizing re-use, e.g., the set [Pete, Barb, Debbie, Frank, Sally] would cover both Gluten Free Twigs and Oven Roasted Chicken.
For an example, say there's a list of sandwiches to be raffled off. The cook makes one, then needs to find out who's in on the raffle (everyone who's not allergic). I want the least repetitious bunch of rubber-banded sets of name cards, bundle A, B, C and so on, such that one could have a list of sandwiches, each indicating which bundle of name cards to throw in the hat for that sandwich. Imagine that the name card paper is really expensive. (Obviously I've changed the problem domain for the sake of example.)
I'm doing this now using the equivalent of a hashtable of person sets, then stuffing pointers to those sets in a dictionary keyed by sandwich. It works just fine, but it feels inelegant.
Thanks to anyone who can name this problem and point me towards a prettier (or more textbook) approach.
Update: I am achieving the desired end result using the equivalent of MySQL's GROUP_CONCAT. This isn't ideal, but I'm adding it because it clarifies the desired end result. In pseudocode:
// SandwichPeople = the sandwich list with a concatenated list of
// people who can eat it:
SELECT Sandwich.SandwichName, GROUP_CONCAT(Person.FullName SEPARATOR ', ') as MemberNames
FROM Sandwich JOIN Person on [...not allergic...]
// SandwichRoster = distinct People from SandwichPeople with auto id
INSERT IGNORE INTO SandwichRoster (MemberNames)
SELECT DISTINCT MemberNames from SandwichPeople
// Match sandwiches with rosters:
SELECT SandwichPeople.SandwichName, SandwichRoster.ID
FROM SandwichPeople
JOIN SandwichRoster on SandwichPeople.MemberNames = SandwichRoster.MemberNames
Create a dictionary of string keys and HashSet<string>
values. Iterate over the person->allergy dictionary once, and for each allergy, get or create a record in the dictionary for that allergy:
// A dictionary containing the set of people who are allergic to any given thing
var allergyLookup = new Dictionary<String, HashSet<String>>();
allergies.ForEach(kvp => {
var allergicSet = allergyLookup.ContainsKey(kvp.Value) ? allergyLookup[kvp.Value] : allergyLookup[kvp.Value] = new HashSet<String>();
allergicSet.Add(kvp.Key);
}
Then when you need to look up the people allergic to a set of ingredients, you can use the fast set-based ExceptWith function:
var ingredients = { "Tuna", "Peanut Butter" };
var peopleWhoCanEatThis = new HashSet<String>(allPeople);
ingredients.ToList().ForEach(i => peopleWhoCanEatThis.ExceptWith(allergyLookup[i]));
HashSet's ExceptWith() function is far faster than the generic one, because it is set-based and can do fixed-time lookups rather than linear-time lookups.
EDIT: Mistakenly used the Except function- the fast set subtraction is ExceptWith: http://msdn.microsoft.com/en-us/library/bb299875.aspx
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With