Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

n-th or Arbitrary Combination of a Large Set

Say I have a set of numbers from [0, ....., 499]. Combinations are currently being generated sequentially using the C++ std::next_permutation. For reference, the size of each tuple I am pulling out is 3, so I am returning sequential results such as [0,1,2], [0,1,3], [0,1,4], ... [497,498,499].

Now, I want to parallelize the code that this is sitting in, so a sequential generation of these combinations will no longer work. Are there any existing algorithms for computing the ith combination of 3 from 500 numbers?

I want to make sure that each thread, regardless of the iterations of the loop it gets, can compute a standalone combination based on the i it is iterating with. So if I want the combination for i=38 in thread 1, I can compute [1,2,5] while simultaneously computing i=0 in thread 2 as [0,1,2].

EDIT Below statement is irrelevant, I mixed myself up

I've looked at algorithms that utilize factorials to narrow down each individual element from left to right, but I can't use these as 500! sure won't fit into memory. Any suggestions?

like image 219
anon_dev1234 Avatar asked Feb 25 '13 01:02

anon_dev1234


2 Answers

Here is my shot:

int k = 527; //The kth combination is calculated
int N=500; //Number of Elements you have
int a=0,b=1,c=2; //a,b,c are the numbers you get out

while(k >= (N-a-1)*(N-a-2)/2){
    k -= (N-a-1)*(N-a-2)/2;
    a++;
}
b= a+1;
while(k >= N-1-b){
    k -= N-1-b;
    b++;
}

c = b+1+k;


cout << "["<<a<<","<<b<<","<<c<<"]"<<endl; //The result

Got this thinking about how many combinations there are until the next number is increased. However it only works for three elements. I can't guarantee that it is correct. Would be cool if you compare it to your results and give some feedback.

like image 51
Haatschii Avatar answered Nov 19 '22 12:11

Haatschii


If you are looking for a way to obtain the lexicographic index or rank of a unique combination instead of a permutation, then your problem falls under the binomial coefficient. The binomial coefficient handles problems of choosing unique combinations in groups of K with a total of N items.

I have written a class in C# to handle common functions for working with the binomial coefficient. It performs the following tasks:

  1. Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters.

  2. Converts the K-indexes to the proper lexicographic index or rank of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle and is very efficient compared to iterating over the set.

  3. Converts the index in a sorted binomial coefficient table to the corresponding K-indexes. I believe it is also faster than older iterative solutions.

  4. Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.

  5. The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to use the 4 above methods. Accessor methods are provided to access the table.

  6. There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.

To read about this class and download the code, see Tablizing The Binomial Coeffieicent.

The following tested code will iterate through each unique combinations:

public void Test10Choose5()
{
   String S;
   int Loop;
   int N = 500;  // Total number of elements in the set.
   int K = 3;  // Total number of elements in each group.
   // Create the bin coeff object required to get all
   // the combos for this N choose K combination.
   BinCoeff<int> BC = new BinCoeff<int>(N, K, false);
   int NumCombos = BinCoeff<int>.GetBinCoeff(N, K);
   // The Kindexes array specifies the indexes for a lexigraphic element.
   int[] KIndexes = new int[K];
   StringBuilder SB = new StringBuilder();
   // Loop thru all the combinations for this N choose K case.
   for (int Combo = 0; Combo < NumCombos; Combo++)
   {
      // Get the k-indexes for this combination.  
      BC.GetKIndexes(Combo, KIndexes);
      // Verify that the Kindexes returned can be used to retrive the
      // rank or lexigraphic order of the KIndexes in the table.
      int Val = BC.GetIndex(true, KIndexes);
      if (Val != Combo)
      {
         S = "Val of " + Val.ToString() + " != Combo Value of " + Combo.ToString();
         Console.WriteLine(S);
      }
      SB.Remove(0, SB.Length);
      for (Loop = 0; Loop < K; Loop++)
      {
         SB.Append(KIndexes[Loop].ToString());
         if (Loop < K - 1)
            SB.Append(" ");
      }
      S = "KIndexes = " + SB.ToString();
      Console.WriteLine(S);
   }
}

You should be able to port this class over fairly easily to C++. You probably will not have to port over the generic part of the class to accomplish your goals. Your test case of 500 choose 3 yields 20,708,500 unique combinations, which will fit in a 4 byte int. If 500 choose 3 is simply an example case and you need to choose combinations greater than 3, then you will have to use longs or perhaps fixed point int.

like image 41
Bob Bryan Avatar answered Nov 19 '22 13:11

Bob Bryan