Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A more efficient way for my program to write every billionth combination?

So the following program generates combinations on characters in this master string, which you will see in the program. First the program generates all of the 48 choose 12 combinations, and then all the way up to 48 choose 19.

The problem is that the total number of combinations is 65 trillion, which is not possible to compute in a reasonable amount of time. I thought, "Ok, well I will just write every billionth one to the file." Well, that will also take a ton of time, because the program still has to count to 65 trillion, even if it only writes every billionth combination.

Is there anything I could modify in my program to avoid this having to count to an extraordinary large number, but still write every billionth combination to a file?

#include <iostream>
#include <string>
#include <iostream>
#include <fstream>
#include <vector>

using namespace std;

template <typename Iterator>
bool next_combination(const Iterator first, Iterator k, const Iterator last)
{
   if ((first == last) || (first == k) || (last == k))
      return false;
   Iterator i1 = first;
   Iterator i2 = last;
   ++i1;
   if (last == i1)
      return false;
   i1 = last;
   --i1;
   i1 = k;
   --i2;
   while (first != i1)
   {
      if (*--i1 < *i2)
      {
         Iterator j = k;
         while (!(*i1 < *j)) ++j;
         std::iter_swap(i1,j);
         ++i1;
         ++j;
         i2 = k;
         std::rotate(i1,j,last);
         while (last != j)
         {
            ++j;
            ++i2;
         }
         std::rotate(k,i2,last);
         return true;
      }
   }
   std::rotate(first,k,last);
   return false;
}

unsigned long long count = 0;

int main()
{
  ofstream myfile;
  myfile.open ("m = 8.txt");

  string s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnop";

  for (int i = 12; i <= 19; i++)
  {
    std::size_t comb_size = i;

    do
    { 
      if (count == 0)
        myfile << std::string(s.begin(),s.begin() + comb_size) << std::endl;

      if (++count % 1000000000 == 0)
        myfile << std::string(s.begin(),s.begin() + comb_size) << std::endl;

    }while(next_combination(s.begin(),s.begin()+ comb_size,s.end()));
  }

  myfile.close();

  cout << "Done!" << endl;

  system("PAUSE");
  return 0;
}
like image 585
user2583947 Avatar asked Jul 15 '13 14:07

user2583947


4 Answers

I've got a simple transformation to use a different library which is about 36X faster than yours. It is still brute force. But while on my machine I'm estimating your code will take 418 days to complete, my code will take only about 3.65 days. Still outrageously long. But it gets it down to a long weekend.

Here's my code:

#include <iostream>
#include <string>
#include <fstream>
#include "../combinations/combinations"

using namespace std;

unsigned long long count = 0;

int main()
{
  ofstream myfile("m = 8.txt");

  string s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnop";

  for (int i = 12; i <= 19; i++)
     for_each_combination(s.begin(), s.begin() + i, s.end(),
        [&](std::string::const_iterator f, std::string::const_iterator l) -> bool
        {
          if (::count++ % 1000000000 == 0)
            myfile << std::string(f, l) << std::endl;
          return false;
        });

  myfile.close();

  cout << "Done!" << endl;
  return 0;
}

Cutting the number of tests on count in the inner loop was a 15% performance increase.

"../combinations/combinations" refers to this library:

http://howardhinnant.github.io/combinations.html

The link includes a description and full source code.

This test program can also easily be modified to count the total number of combinations:

#include <iostream>
#include <string>
#include "../combinations/combinations"

using namespace std;


int main()
{
  string s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnop";
  unsigned long long count = 0;
  for (int i = 12; i <= 19; i++)
     count += count_each_combination(s.begin(), s.begin() + i, s.end());

  cout << "Done! " << count << endl;
  return 0;
}

which outputs:

Done! 27189132782091

The code is open source with a boost license (it is not part of the boost library). Feel free to use it.

like image 165
Howard Hinnant Avatar answered Nov 18 '22 00:11

Howard Hinnant


This is the code I wrote before for finding the kth permutation of a given string. I think my idea is similar to @Tarik that we don't need to list all the permutations before the kth.

string getPermutation(string s, int k) {
    string res;
    int n = s.size();
    int total = 1, digits = n - 1;
    for (int i = 1; i < n; ++i)
        total *= i;
    while (res.size() < n)
    {
        int i = 0;
        for (int m = 1; m < (int) ceil(k * 1.0 / total); ++m)
            i++;
        res += s[i];
        s.erase(s.begin() + i); // erase from string is not a good idea:)
        k = (k % total == 0) ? total : k % total;
        total = (total == 1) ? 1 : total / digits--;
    }
    return res;
}

It works well for short string. For example getPermutation("12345", 37) will return 24135.

But for your string s with length 48, the variable total will overflow even with type long long. So we need to do extra work handling this.

My code is somewhat hard to understand:) You may improve on my code. I hope this will help you.

UPDADE: I realize that what you need is combination not permutation. I totally went wrong! Forget my code:)

like image 20
Annie Kim Avatar answered Nov 17 '22 23:11

Annie Kim


From http://en.wikipedia.org/wiki/Combinadic there is an algorithm to compute directly the k-th combination. You need first to store Pascal's triangle. If you need some code example, you can have a look at (Python language) https://github.com/sagemath/sagelib/blob/master/sage/combinat/choose_nk.py.

like image 1
hivert Avatar answered Nov 17 '22 23:11

hivert


You can use a bitvector to speed up some of the computations, adapted from the bit-twiddling pages of Chess Programming Wiki.

#include <iostream>
#include <iomanip>
#include <cstdint>

using U64 = uint64_t;

// generate the next integer with the same number of bits as c
U64 next_combination(U64 c) 
{
    auto const smallest = c & -c;
    auto const ripple = c + smallest;
    auto ones = c ^ ripple;
    ones = (ones >> 2) / smallest;
    return ripple | ones;
}

// generate all integers with k of the first n bits set
template<class Function>
void for_each_combination(std::size_t n, std::size_t k, Function fun)
{
    U64 y;
    auto const n_mask = (1ULL << n) - 1; // mask with all n bits set to 1
    auto const k_mask = (1ULL << k) - 1; // mask with first k bits set to 1

    auto x = k_mask; fun(x);
    for (; (y = next_combination(x) & n_mask) > x; x = y) fun(y);
}

int main() 
{
    auto const million = 1000000ULL;
    auto count = U64 { 0 };
    for (auto i = 12; i < 20; ++i) {
        for_each_combination(48, i, [&](U64 c) {
        /*if (count++ & million == 0) std::cout << std::dec << std::setfill(' ') << std::setw(8) << (count - 1) / million << ": " << std::hex << std::showbase << std::setfill('0') << std::setw(16) << c << "\n";*/
            ++count;
        });
    }
    std::cout << count << "\n";
}

On a single core inside a virtualbox of my Xeon E5-1650 @3.2 Ghz, my best estimate is that it will take 3.52 days to increment the counter 2.7e13 times (not generating the output itself). It only works for subsets B(n, k) with n < 64, unless you use some 128-bit integer class.

Given a bitvector that has k out of n bits set to 1, it is a straightforward matter to map that to the original sequence of chars or any other type and print whatever combination is required. For sequences that do not have random iterator access, it is of course more expensive than Howard Hinnant's approach.

like image 1
TemplateRex Avatar answered Nov 18 '22 00:11

TemplateRex