Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterate through different subset of size k

I have an array of n integers (not necessarily distinct!) and I would like to iterate over all subsets of size k. However I'd like to exclude all duplicate subsets.

e.g.

array = {1,2,2,3,3,3,3}, n = 7, k = 2

then the subsets I want to iterate over (each once) are:

{1,2},{1,3},{2,2},{2,3},{3,3}

What is an efficient algorithm for doing this? Is a recursive approach the most efficient/elegant?

In case you have a language-specific answer, I'm using C++.

like image 431
Alex Avatar asked May 28 '15 00:05

Alex


4 Answers

The same (or almost the same) algorithm which is used to generated combinations of a set of unique values in lexicographical order can be used to generate combinations of a multiset in lexicographical order. Doing it this way avoids the necessity to deduplicate, which is horribly expensive, and also avoids the necessity of maintaining all the generated combinations. It does require that the original list of values be sorted.

The following simple implementation finds the next k-combination of a multiset of n values in average (and worst-case) time O(n). It expects two ranges: the first range is a sorted k-combination, and the second range is the sorted multiset. (If either range is unsorted or the values in first range do not constitute a sub(multi)set of the second range, then the behaviour is undefined; no sanity checks are made.)

Only the end iterator from the second range is actually used, but I thought that made the calling convention a bit odd.

template<typename BidiIter, typename CBidiIter,
         typename Compare = std::less<typename BidiIter::value_type>>
int next_comb(BidiIter first, BidiIter last,
              CBidiIter /* first_value */, CBidiIter last_value,
              Compare comp=Compare()) {
  /* 1. Find the rightmost value which could be advanced, if any */
  auto p = last;
  while (p != first && !comp(*(p - 1), *--last_value)) --p;
  if (p == first) return false;
  /* 2. Find the smallest value which is greater than the selected value */
  for (--p; comp(*p, *(last_value - 1)); --last_value) { }
  /* 3. Overwrite the suffix of the subset with the lexicographically smallest
   *    sequence starting with the new value */
  while (p != last) *p++ = *last_value++;
  return true;
}

It should be clear that steps 1 and 2 combined make at most O(n) comparisons, because each of the n values is used in at most one comparison. Step 3 copies at most O(k) values, and we know that kn.

This could be improved to O(k) in the case where no values are repeated, by maintaining the current combination as a container of iterators into the value list rather than actual values. This would also avoid copying values, at the cost of extra dereferences. If in addition we cache the function which associates each value iterator with an iterator to the first instance of next largest value, we could eliminate Step 2 and reduce the algorithm to O(k) even for repeated values. That might be worthwhile if there are a large number of repeats and comparisons are expensive.

Here's a simple use example:

std::vector<int> values = {1,2,2,3,3,3,3};
/* Since that's sorted, the first subset is just the first k values */
const int k = 2;
std::vector<int> subset{values.cbegin(), values.cbegin() + k};

/* Print each combination */
do {
  for (auto const& v : subset) std::cout << v << ' ';
  std::cout << '\n';
} while (next_comb(subset.begin(),  subset.end(),
                   values.cbegin(), values.cend()));

Live on coliru

like image 128
rici Avatar answered Nov 02 '22 05:11

rici


I like bit-twiddling for this problem. Sure, it limits you to only 32 elements in your vector, but it's still cool.

First, given a bit mask, determine the next bitmask permutation (source):

uint32_t next(uint32_t v) {
    uint32_t t = v | (v - 1);
    return (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));  
}

Next, given a vector and a bitmask, give a new vector based on that mask:

std::vector<int> filter(const std::vector<int>& v, uint32_t mask) {
    std::vector<int> res;
    while (mask) {
        res.push_back(v[__builtin_ctz(mask)]);
        mask &= mask - 1;
    }   
    return res;
}

And with that, we just need a loop:

std::set<std::vector<int>> get_subsets(const std::vector<int>& arr, uint32_t k) {   
    std::set<std::vector<int>> s;
    uint32_t max = (1 << arr.size());
    for (uint32_t v = (1 << k) - 1; v < max; v = next(v)) {
        s.insert(filter(arr, v));
    }
    return s;
}

int main()
{
    auto s = get_subsets({1, 2, 2, 3, 3, 3, 3}, 2);
    std::cout << s.size() << std::endl; // prints 5
}
like image 30
Barry Avatar answered Nov 02 '22 05:11

Barry


The basic idea of this solution is a function like next_permutation but which generates the next ascending sequence of "digits". Here called ascend_ordered.

template< class It >
auto ascend_ordered( const int n_digits, const It begin, const It end )
    -> bool
{
    using R_it = reverse_iterator< It >;
    const R_it r_begin  = R_it( end );
    const R_it r_end    = R_it( begin );

    int max_digit = n_digits - 1;
    for( R_it it = r_begin ; it != r_end; ++it )
    {
        if( *it < max_digit )
        {
            ++*it;
            const int n_further_items = it - r_begin;
            for( It it2 = end - n_further_items; it2 != end; ++it2 )
            {
                *it2 = *(it2 - 1) + 1;
            }
            return true;
        }
        --max_digit;
    }
    return false;
}

Main program for the case at hand:

auto main() -> int
{
    vector<int> a = {1,2,2,3,3,3,3};
    assert( is_sorted( begin( a ), end( a ) ) );
    const int k = 2;
    const int n = a.size();
    vector<int> indices( k );
    iota( indices.begin(), indices.end(), 0 );      // Fill with 0, 1, 2 ...
    set<vector<int>> encountered;
    for( ;; )
    {
        vector<int> current;
        for( int const i : indices ) { current.push_back( a[i] ); }
        if( encountered.count( current ) == 0 )
        {
            cout << "Indices " << indices << " -> values " << current << endl;
            encountered.insert( current );
        }
        if( not ascend_ordered( n, begin( indices ), end( indices ) ) )
        {
            break;
        }
    }
}

Supporting includes and i/o:

#include <algorithm>
using std::is_sorted;

#include <assert.h>

#include <iterator>
using std::reverse_iterator;

#include <iostream>
using std::ostream; using std::cout; using std::endl;

#include <numeric>
using std::iota;

#include <set>
using std::set;

#include <utility>
using std::begin; using std::end;

#include <vector>
using std::vector;

template< class Container, class Enable_if = typename Container::value_type >
auto operator<<( ostream& stream, const Container& c )
    -> ostream&
{
    stream << "{";
    int n_items_outputted = 0;
    for( const int x : c )
    {
        if( n_items_outputted >= 1 ) { stream << ", "; }
        stream << x;
        ++n_items_outputted;
    }
    stream << "}";
    return stream;
}
like image 42
Cheers and hth. - Alf Avatar answered Nov 02 '22 07:11

Cheers and hth. - Alf


Unlike the previous answer, this is not as efficient and doesn't do anything as fancy as a lot of the bit twiddling. However it does not limit the size of your array or the size of the subset.

This solution uses std::next_permutation to generate the combinations, and takes advantage of std::set's uniqueness property.

#include <algorithm>
#include <vector>
#include <set>
#include <iostream>
#include <iterator>

using namespace std;

std::set<std::vector<int>> getSubsets(const std::vector<int>& vect, size_t numToChoose)
{
    std::set<std::vector<int>> returnVal;
    // return the whole thing if we want to
    // choose everything 
    if (numToChoose >= vect.size())
    {
        returnVal.insert(vect);
        return returnVal;
    }

    // set up bool vector for combination processing
    std::vector<bool> bVect(vect.size() - numToChoose, false);

    // stick the true values at the end of the vector
    bVect.resize(bVect.size() + numToChoose, true); 

    // select where the ones are set in the bool vector and populate
    // the combination vector
    do
    {
        std::vector<int> combination;
        for (size_t i = 0; i < bVect.size() && combination.size() <= numToChoose; ++i)
        {
            if (bVect[i])
                combination.push_back(vect[i]);
        }
        // sort the combinations
        std::sort(combination.begin(), combination.end());

        // insert this new combination in the set
        returnVal.insert(combination);
    } while (next_permutation(bVect.begin(), bVect.end()));
    return returnVal;
}

int main()
{
    std::vector<int> myVect = {1,2,2,3,3,3,3};

    // number to select
    size_t numToSelect = 3;

    // get the subsets
    std::set<std::vector<int>> subSets = getSubsets(myVect, numToSelect);

    // output the results
    for_each(subSets.begin(), subSets.end(), [] (const vector<int>& v) 
    { cout << "subset "; copy(v.begin(), v.end(), ostream_iterator<int>(cout, " ")); cout << "\n"; });
}

Live example: http://coliru.stacked-crooked.com/a/beb800809d78db1a

Basically we set up a bool vector and populate a vector with the values that correspond with the position of the true items in the bool vector. Then we sort and insert this into a set. The std::next_permutation shuffles the true values in the bool array around and we just repeat.

Admittedly, not as sophisticated and more than likely slower than the previous answer, but it should do the job.

like image 29
PaulMcKenzie Avatar answered Nov 02 '22 06:11

PaulMcKenzie