Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping algorithm for combinations

Tags:

c#

algorithm

linq

Let's say I have a list of items, each item is defined by a simple structure

struct simpleItem
{
    String Category1;
    String Category2;
    ...
    String CategoryN;
}

Each item have a series of values belonging to some categories. The number of categories N is known at the time of processing a list and each items have the same amount of categories and only one value per category, there are no duplicate items. However, each list can have a different category set.

I'm looking for a way to group these items by categories in a way that if those groups are deconstructed into single items by combining each permutations of categories, I'll end up with the original combinaisons, with no duplicates.

A group result will be:

struct grouped
{
    String[] Category1;
    String[] Category2;
    ...
    String[] CategoryN;
}

Example

For the sake of this example, we'll limit to 3 categories, but there can be N.

Categories

Animal, Eye Color, Fur

Choices for "Animal" category: Cat, Dog, Rat, Horse

Choices for "Eye Color" category: Blue, Yellow, Green, Red, Orange

Choices for "Fur" category: Long, Short, Curly

If the list had all the permutations for those 3 categories, the end result would be

Group 1 :
Animal      [Cat, Dog, Rat, Horse]
Eye Color [Blue, Yellow, Green, Red, Orange]
Fur            [Long, Short, Curly]

If I have a sublist, for example:

  1. Cat, Blue, Long
  2. Cat, Blue, Short
  3. Dog, Blue, Long
  4. Dog, Blue, Short
  5. Dog, Green Long
  6. Rat, Red, Short
  7. Rat, Blue, Short

Let's call this list, Input (A)

After grouping these items into group we could end up with : (there could be other possibilities). The grouping criteria would be to have the less output groups as possible.

Group 1:
Animal      [Cat,     Dog]
Eye Color [Blue           ]
Fur           [Long, Short]

Group 2:
Animal      [Dog]
Eye Color [Green ]
Fur           [Long]

Group 3:
Animal      [Rat           ]
Eye Color [Red, Blue]
Fur           [Short        ]

Let's call these groups, Output (B)

As we can see, by combining each items of the resulting groups, we'll end back to the original input list of 7 elements in (A).

Question

So, I'm trying to write an algorithm that generates these groups. I am trying to do this with LINQ, but I am also open for other suggestions. Any suggestion on how to get to (B) from (A)?

like image 587
Mathieu Avatar asked Aug 26 '13 15:08

Mathieu


1 Answers

  1. Take each input and treat it as it's own group.
    • So, for example, Cat, Blue, Long becomes the group of [Cat], [Blue], [Long], which each category having exactly one item.
  2. Go through each group in your list, starting with the first. Pair it with each other group in the list. Combine those pairs of groups into a single group if they meet the appropriate criteria.
    • The criteria for merging groups is if the set of values for n-1 of the categories are the same, and exactly one category set does not match. If that is the case, create a new group with the n-1 similar categories being the same, and the remaining category the intersection of the sets.
  3. If you find a match, stop comparing pairs and start over with the first item in the first group. (Using deferred execution here helps you, so that you don't bother pairing up the groups as soon as you find a match.)
  4. If you go through the entire set without finding a match then you are done, there are no more combinations to be made.

So, walking through your example. First it would pair up the first and second groups. The first two category sets are the same, the third is different, so they can be merged. You now have a list that is:

  1. [Cat], [Blue], [Long, Short]
  2. [Dog], [Blue], [Long]
  3. [Dog], [Blue], [Short]
  4. [Dog], [Green], [Long]
  5. [Rat], [Red], [Short]
  6. [Rat], [Blue], [Short]

Next we compare the (new) first and second groups. Both the first and third categories don't match, no merge. Next we compare the first and third, again, same two categories won't match. The first group won't match any others. So now we move onto the second group. We pair it up with the third. It can be merged, as the first two categories are different:

  1. [Cat], [Blue], [Long, Short]
  2. [Dog], [Blue], [Long, Short]
  3. [Dog], [Green], [Long]
  4. [Rat], [Red], [Short]
  5. [Rat], [Blue], [Short]

Now we start over, pairing the first and second groups. They match. The first category is different, the second the same, and the third the same. It is now:

  1. [Cat, Dog], [Blue], [Long, Short]
  2. [Dog], [Green], [Long]
  3. [Rat], [Red], [Short]
  4. [Rat], [Blue], [Short]

We'd now compare the first to each of the other three, none will match. We then compare the second to the other two, none will match. Finally the third and fourth will match, as only the second category is different:

  1. [Cat, Dog], [Blue], [Long, Short]
  2. [Dog], [Green], [Long]
  3. [Rat], [Red, Blue], [Short]

Finally we'll go through all of the combinations, none of the groups will match merge conditions, and we're done.

like image 95
Servy Avatar answered Oct 12 '22 22:10

Servy