Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count of combinations with repetitions

I have a very inefficient way of counting the combinations of N/2 items from an array of size N. What I do is sort the array to begin with and then loop through the the permutations of the array, creating multisets with half the elements and insert that into a set. Finally I get the count of the set.

long GetCombinations(std::vector<double> nums) {
    long combinations = 0;
    std::sort(nums.begin(), nums.end());
    std::set<std::multiset<double>> super_set;

    do {
        std::multiset<double> multi_set;

        for (unsigned int i = 0; i < nums.size() / 2; ++i)
            multi_set.insert(nums[i]);

        auto el = (super_set.insert(multi_set));

        if (el.second)
            ++combinations;

    } while (std::next_permutation(nums.begin(), nums.end()));

    return combinations;
}

The code works but it is very inefficient. For the given array [0.5, 0.5, 1, 1] there are 3 combinations of size 2:

0.5, 0.5
1, 1
1, 0.5

Is there a different algorithm or approach that can increase the speed of this code?

like image 312
jignatius Avatar asked Jun 20 '18 13:06

jignatius


People also ask

How do you find the number of repetition combinations?

The number of k-element combinations of n objects, with repetition is Cn,k = Cn+k-1,k = (n + k − 1 k ) = ((n k )) . It is also the number of all ways to put k identical balls into n distinct boxes, or the number of all functions from a set of k identical elements to a set of n distinct elements.

What does combination with repetition mean?

Combinations with repetition A combination with repetition of objects from is a way of selecting objects from a list of. . The selection rules are: the order of selection does not matter (the same objects selected in different orders are regarded as the same combination); each object can be selected more than once.

How many combinations can you make with the numbers 1 2 3 with repetition?

Answer: 125 As repetition is allowed, So the number of digits available for B and C will also be 5 (each).


1 Answers

Counting Combinations

Generally speaking, determining the number of combinations of a particular set is quite trivial. However, extending this to a multiset where each element is repeated a specific number of times is considerably more difficult and not as well documented. @WorldSEnder linked to a math/stackexchange answer that has a comment with a link to this wonderful article in combinatorics called Combinatorial Generation by Frank Ruskey. If you go to page 71, there is a section that treats this topic with greater rigor.

Basic definitions

  1. Set - A collection of distinct objects. - E.g. {a, b} is the same as {a, a, b} and both have cardinality 2
  2. Multiset - Similar to a set, but allows for duplicate entries. - E.g. {a, b} and {a, a, b} are different multisets with cardinality 2 and 3 respectively
  3. Binomial Coefficient - Gives the number of k-element subsets of an n-element set.
  4. Multiset Coefficient/Number - The number of multisets of cardinality k with elements taken from a finite set.

Misconceptions

There is a belief that there is a simple formula that will quickly calculate the number of combinations for a multiset of length k where each element is repeated a specific number of times (see the highly upvoted comments above). Below, we examine each of the well-known methods.

Let us start with the general application of the binomial coefficient. We immediately see that this will fail as it is strictly meant to calculate the number of combinations of a set, where duplicate entries are not allowed. In our case, duplicates are allowed.

Reading further down on the Wikipedia page, there is a section called Number of combinations with repetition. This looks promising as we do have some replication. We also see a modified binomial coefficient, which seems even more promising. A closer look reveals that this too will fail as this strictly applies to multisets where each element is repeated up to k times.

Lastly, we try out the multiset coefficient. One of the examples listed looks very similar to what we are trying to accomplish.

"First, consider the notation for multisets that would represent {a, a, a, a, a, a, b, b, c, c, c, d, d, d, d, d, d, d} (6 as, 2 bs, 3 cs, 7 ds) in this form:"

This looks like a good candidate for what we are trying to derive. However, you will see that they continue to derive the number of ways you can construct a multiset of cardinality 18 from a set of 4 distinct elements. This is equivalent to the number of integer compositions of 18 of length 4. E.g.

18 + 0 + 0 + 0
17 + 1 + 0 + 0
16 + 2 + 0 + 0
       .
       .
       .
5 +  4 + 6 + 3
4 +  5 + 6 + 3
3 +  6 + 6 + 3
       .
       .
       .
0 +  1 + 0 + 17
0 +  0 + 1 + 17
0 +  0 + 0 + 18

As you can see, order matters with compositions which clearly doesn't apply to our situation.

The last two methods mentioned are derived from the renowned Stars and Bars method for simple counting problems. As far as I can tell, this method cannot easily be extended to our case.

A Working Algorithm

unsigned long int getCombinationCount(std::vector<double> nums) {

    unsigned long int n = nums.size();
    unsigned long int n2 = n / 2;
    unsigned long int numUnique = 1;
    unsigned long int numCombinations;

    std::sort(nums.begin(), nums.end());
    std::vector<int> numReps;

    double testVal = nums[0];
    numReps.push_back(1);

    for (std::size_t i = 1; i < n; ++i) {
        if (nums[i] != testVal) {
            numReps.push_back(1);
            testVal = nums[i];
            ++numUnique;
        } else {
            ++numReps[numUnique - 1];
        }
    }

    int myMax, r = n2 + 1;
    std::vector<double> triangleVec(r);
    std::vector<double> temp(r);
    double tempSum;

    myMax = r;
    if (myMax > numReps[0] + 1)
        myMax = numReps[0] + 1;

    for (int i = 0; i < myMax; ++i)
        triangleVec[i] = 1;

    temp = triangleVec;

    for (std::size_t k = 1; k < numUnique; ++k) {
        for (int i = n2; i > 0; --i) {
            myMax = i - numReps[k];
            if (myMax < 0)
                myMax = 0;

            tempSum = 0;
            for (int j = myMax; j <= i; ++j)
                tempSum += triangleVec[j];

            temp[i] = tempSum;
        }
        triangleVec = temp;
    }

    numCombinations = (unsigned long int) triangleVec[n2];

    return numCombinations;
}

Explanation using a Modified Pascal's Triangle

The entries in the traditional Pascal's Triangle (PT from here on out) represent the binomial coefficient where the row of the triangle is the number of elements in your set and the column is the length of combinations you wish to generate. The construction of the triangle is the key to how we will attack the problem at hand.

If you note that for the traditional PT, to obtain a particular entry, say (i, j) where i is the row and j is the column, you must add the entries (i - 1, j - 1) and (i - 1, j). Here is an illustration.

                  1
                1   1
              1   2   1            N.B. The first 10 is in the 5th row and 3rd column
            1   3   3   1               and is obtained by adding the entries from the
          1   4   6   4   1             4th row and 2nd/3rd.
        1   5   10  10  5   1
      1   6   15  20  15  6   1

We can extend this to a general multiset where each element is repeated a specific number of times. Let us consider a couple of examples.

Example 1: v1 = {1, 2, 2}, v2 = {1, 2, 2, 3, 3, 3}, and v3 = {1,2,2,3,3,3,4,4,4,4}

Below we have all of the possible combinations of v1 choose 1 - 3 as well as v2 choose 1 - 6.

     [,1]                    [,1]
[1,]    1               [1,]    1
[2,]    2               [2,]    2
                        [3,]    3

     [,1] [,2]               [,1] [,2]
[1,]    1    2          [1,]    1    2
[2,]    2    2          [2,]    1    3
                        [3,]    2    2
                        [4,]    2    3
                        [5,]    3    3

     [,1] [,2] [,3]          [,1] [,2] [,3]
[1,]    1    2    2     [1,]    1    2    2
                        [2,]    1    2    3
                        [3,]    1    3    3
                        [4,]    2    2    3
                        [5,]    2    3    3
                        [6,]    3    3    3

                             [,1] [,2] [,3] [,4]
                        [1,]    1    2    2    3
                        [2,]    1    2    3    3
                        [3,]    1    3    3    3
                        [4,]    2    2    3    3
                        [5,]    2    3    3    3

                             [,1] [,2] [,3] [,4] [,5]
                        [1,]    1    2    2    3    3
                        [2,]    1    2    3    3    3
                        [3,]    2    2    3    3    3

                             [,1] [,2] [,3] [,4] [,5] [,6]
                        [1,]    1    2    2    3    3    3

Let us write down the number of combinations for all k for both v1 and v2.

2  2  1
3  5  6  5  3  1

I am going to give you the number of combinations for all k of v3 (I will leave it to the reader to enumerate them).

4  9 15 20 22 20 15  9  4  1

We combine the results above in a special way and note that things are starting to look very familiar.

         2  2  1
     3   5   6   5  3  1
4  9  15  20  22  20  15  9  4  1

We add a few ones as place holders to complete this modified PT

                1   1
            1   2   2   1
      1   3   5   6   5   3   1
1  4  9  15  20  22  20  15   9  4  1

What does this mean? It is somewhat clear that the numbers in each successive row are a combination of the numbers in the previous row. But how?....

We let the frequency of each element guide us.

For example, to obtain the third row representing the number of combinations of v2 choose 1 - 6 (ignoring the first 1), we look to row 2. Since the frequency of the 3rd element is 3, we add the 4 elements (3 + 1.. just like with binomial coefficients for finding number of combinations of sets with distinct elements, we add 2 entries to together or 1 + 1) in the row above with the column less than or equal to the column we are finding. So we have:

if the column index is non-positive or greater than the 
number of columns in the previous row, the value is 0

    v2 choose 3
(3, 2) =  (2, 2 - 3) + (2, 2 - 2) + (2, 2 - 1) + (2, 2 - 0)
       =       0     +      0     +      1     +    2 
       =   3

v2 choose 4           
(3, 3) =  (2, 3 - 3) + (2, 3 - 2) + (2, 3 - 1) + (2, 3 - 0)
       =       0     +      1     +      2     +    2 
       =   5           

v2 choose 5 
(3, 4) =  (2, 4 - 3) + (2, 4 - 2) + (2, 4 - 1) + (2, 4 - 0)
       =       1     +      2     +      2     +    1 
       =   6

v2 choose 6                                   outside of range
(3, 5) =  (2, 5 - 3) + (2, 5 - 2) + (2, 5 - 1) + (2, 5 - 0)
       =       2     +      2     +      1     +    0 
       =   5

       etc.

Continuing with this logic, let's see if we can obtain the number of k-combinations for v3. Since the frequency of the 4th element is 4, we will need to add 5 entries together.

v3 choose 3
(4, 2) =  (3, 2 - 4) + (3, 2 - 3) + (3, 2 - 2) + (3, 2 - 1) + (3, 2 - 0)
       =       0     +      0     +     0      +      1     +     3 
       =   4

v3 choose 4 
(4, 3) =  (3, 3 - 4) + (3, 3 - 3) + (3, 3 - 2) + (3, 3 - 1) + (3, 3 - 0)
       =       0     +      0     +      1     +    3       +     5
       =   9           

v3 choose 5  
(4, 4) =  (3, 4 - 4) + (3, 4 - 3) + (3, 4 - 2) + (3, 4 - 1) + (3, 4 - 0)
       =       0     +     1      +      3     +     5      +     6
       =   15

v3 choose 6
(4, 5) =  (3, 5 - 4) + (3, 5 - 3) + (3, 5 - 2) + (3, 5 - 1) + (3, 5 - 0)
       =       1     +     3      +      5     +       6    +    5
       =   20

       etc.

And indeed, we do get the correct number of k-combinations of v3.

Example 2: z1 = {1,1,1,2}, z2 = {1,1,1,1,2,3,3,3,3,3}, and z3 = {1,1,1,1,2,3,3,3,3,3,4,4}

You will notice that we are constructing these vectors such that each successive vector contains the previous vectors. We do this so that we will be able to properly construct our modified PT. This is analogous to the traditional PT where with each successive row, we are simply adding one number to the previous row. The modified PT for these vectors is:

                1   1   1  1
             1   2   2   2   1
      1  3  5  7   8   8   7   5   3  1
  1  4   9  15  20  23   23  20  15  9  4  1

Let's construct z2 choose 6 and z3 choose 9 to see if we are correct:

 z2 choose 6
      [,1] [,2] [,3] [,4] [,5] [,6]
 [1,]    1    1    1    2    3    3
 [2,]    1    1    1    3    3    3      This shows that we produce 7 combs
 [3,]    1    1    2    3    3    3      just as predicted by our modified
 [4,]    1    1    3    3    3    3      PT (i.e. entry (3, 6 + 1) = 7)
 [5,]    1    2    3    3    3    3
 [6,]    1    3    3    3    3    3
 [7,]    2    3    3    3    3    3


 z3 choose 9
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,]    1    1    1    2    3    3    3    3    3
[2,]    1    1    1    2    3    3    3    3    4
[3,]    1    1    1    2    3    3    3    4    4  This shows that we produce 9 
[4,]    1    1    1    3    3    3    3    3    4  combs just as predicted by 
[5,]    1    1    1    3    3    3    3    4    4  our modified PT (i.e. entry
[6,]    1    1    2    3    3    3    3    3    4  (4, 9 + 1) = 9)
[7,]    1    1    2    3    3    3    3    4    4
[8,]    1    1    3    3    3    3    3    4    4
[9,]    1    2    3    3    3    3    3    4    4

Just as a quick note, the first row of place holding ones is analogous to the second row of the traditional PT (i.e. 1 1). Loosely speaking (see the code for edge cases), if the first element has frequency m, the first row of the modified PT will contain m + 1 ones.

Reason why there isn't a general formula (e.g. something similar to the binomial coefficient)

As you can see from the 2 examples above, the modified PT's are based on specific multisets, and thus cannot be generalized. Even if you considered multisets of certain cardinality made up of the same distinct elements, the modified PT's would differ. For example, the multiset a = {1, 2, 2, 3, 3, 3} and b = {1, 1, 2, 2, 3, 3} generate the following modified PT's respectively:

     1 1
   1 2 2 1
1 3 5 6 5 3 1

    1 1 1
  1 2 3 2 1
1 3 6 7 6 3 1

Note that a choose 2 = 5 whereas b choose 2 = 6.

Benchmarks:

Here is a link to ideone demonstrating the speed up of the new algorithm. For the vector {4, 2, 6, 4, 9, 8, 2, 4, 1, 1, 6, 9}, the timing for the original was 2285718 clock ticks whereas the algorithm above completed in 8 clock ticks for a total speed up of 2285728 / 8 = 285714.75... over one hundred thousands times faster. They both return the same number of combinations as well (i.e. 122). Most of the speed gains come from avoiding explicitly generating any combinations (or permutations as the OP's code does).

like image 133
Joseph Wood Avatar answered Nov 14 '22 05:11

Joseph Wood