I've been scratching my head about this for two days now and I cannot come up with a solution. What I'm looking for is a function f(s, n)
such that it returns a set containing all subsets of s
where the length of each subset is n
.
Demo:
s={a, b, c, d}
f(s, 4)
{{a, b, c, d}}
f(s, 3)
{{a, b, c}, {a, b, d}, {a, c, d}, {b, c, d}}
f(s, 2)
{{a, b}, {a, c}, {a, d}, {b, c}, {b, d}, {c, d}}
f(s, 1)
{{a}, {b}, {c}, {d}}
I have a feeling that recursion is the way to go here. I've been fiddling with something like
f(S, n):
for s in S:
t = f( S-{s}, n-1 )
...
But this does not seem to do the trick. I did notice that len(f(s,n))
seems to be the binomial coefficient bin(len(s), n)
. I guess this could be utilized somehow.
Can you help me please?
Let us call n
the size of the array and k
the number of elements to be out in a subarray.
Let us consider the first element A[0]
of the array A
.
If this element is put in the subset, the problem becomes a (n-1, k-1)
similar problem.
If not, it becomes a (n-1, k)
problem.
This can be simply implemented in a recursive function.
We just have to pay attention to deal with the extreme cases k == 0
or k > n
.
During the process, we also have to keep trace of:
n: the number of remaining elements of A to consider
k: the number of elements that remain to be put in the current subset
index: the index of the next element of A to consider
The current_subset
array that memorizes the elements already selected.
Here is a simple code in c++ to illustrate the algorithm
Output
For 5 elements and subsets of size 3:
3 4 5
2 4 5
2 3 5
2 3 4
1 4 5
1 3 5
1 3 4
1 2 5
1 2 4
1 2 3
#include <iostream>
#include <vector>
void print (const std::vector<std::vector<int>>& subsets) {
for (auto &v: subsets) {
for (auto &x: v) {
std::cout << x << " ";
}
std::cout << "\n";
}
}
// n: number of remaining elements of A to consider
// k: number of elements that remain to be put in the current subset
// index: index of next element of A to consider
void Get_subset_rec (std::vector<std::vector<int>>& subsets, int n, int k, int index, std::vector<int>& A, std::vector<int>& current_subset) {
if (n < k) return;
if (k == 0) {
subsets.push_back (current_subset);
return;
}
Get_subset_rec (subsets, n-1, k, index+1, A, current_subset);
current_subset.push_back(A[index]);
Get_subset_rec (subsets, n-1, k-1, index+1, A, current_subset);
current_subset.pop_back(); // remove last element
return;
}
void Get_subset (std::vector<std::vector<int>>& subsets, int subset_length, std::vector<int>& A) {
std::vector<int> current_subset;
Get_subset_rec (subsets, A.size(), subset_length, 0, A, current_subset);
}
int main () {
int subset_length = 3; // subset size
std::vector A = {1, 2, 3, 4, 5};
int size = A.size();
std::vector<std::vector<int>> subsets;
Get_subset (subsets, subset_length, A);
std::cout << subsets.size() << "\n";
print (subsets);
}
Live demo
One way to solve this is by backtracking. Here's a possible algorithm in pseudo code:
def backtrack(input_set, idx, partial_res, res, n):
if len(partial_res == n):
res.append(partial_res[:])
return
for i in range(idx, len(input_set)):
partial_res.append(input_set[i])
backtrack(input_set, idx+1, partial_res, res, n) # path with input_set[i]
partial_res.pop()
backtrack(input_set, idx+1, partial_res, res, n) # path without input_set[i]
Time complexity of this approach is O(2^len(input_set))
since we make 2 branches at each element of input_set
, regardless of whether the path leads to a valid result or not. The space complexity is O(len(input_set) choose n)
since this is the number of valid subsets you get, as you correctly pointed out in your question.
Now, there is a way to optimize the above algorithm to reduce the time complexity to O(len(input_set) choose n)
by pruning the recursive tree to paths that can lead to valid results only.
If n - len(partial_res) < len(input_set) - idx + 1
, we are sure that even if we took every remaining element in input_set[idx:]
we are still short at least one to reach n
. So we can employ this as a base case and return and prune.
Also, if n - len(partial_res) == len(input_set) - idx + 1
, this means that we need each and every element in input_set[idx:]
to get the required n
length result. Thus, we can't skip any elements and so the second branch of our recursive call becomes redundant.
backtrack(input_set, idx+1, partial_res, res, n) # path without input_set[i]
We can skip this branch with a conditional check.
Implementing these base cases correctly, reduces the time complexity of the algorithm to O(len(input_set) choose k)
, which is a hard limit because that's the number of subsets that there are.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With