I've been trying to figure out a way to generate all distinct size-n partitions of a multiset, but so far have come up empty handed. First let me show what I'm trying to archieve.
Let's say we have an input vector of uint32_t
:
std::vector<uint32_t> input = {1, 1, 2, 2}
An let's say we want to create all distinct 2-size partitions. There's only two of these, namely:
[[1, 1], [2, 2]], [[1, 2], [1, 2]]
Note that order does not matter, i.e. all of the following are duplicate, incorrect solutions.
Duplicate because order within a permutation group does not matter:
[[2, 1], [1, 2]]
Duplicate because order of groups does not matter:
[[2, 2], [1, 1]]
Not homework of some kind BTW. I encountered this while coding something at work, but by now it is out of personal interest that I'd like to know how to deal with this. The parameters for the work-related problem were small enough that generating a couple thousand duplicate solutions didn't really matter.
In order to illustrate that I'm not just asking without having tried to come up with a solution, let me try to explain my current algorithm (which generates duplicate solutions when used with multisets).
It works as follows: the state has a bitset with n bits set to 1 for each partition block. The length of the bitsets is size(input) - n * index_block()
, e.g. if the input vector has 8 elements and n = 2, then the first partition block uses an 8-bit bitset with 2 bits set to 1, the next partition block uses a 6-bit bitset with 2 bits set to 1, etc.
A partition is created from these bitsets by iterating over each bitset in order and extracting the elements of the input vector with indices equal to the position of 1-bits in the current bitset.
In order to generate the next partition, I iterate over the bitsets in reverse order. The next bitset permutation is calculated (using a reverse of Gosper's hack). If the first bit in the current bitset is not set (i.e. vector index 0 not selected), then that bitset is reset to its starting state. Enforcing that the first bit is always set prevents generating duplicates when creating size-n set partitions (duplicates of the 2nd kind shown above). If the current bitset is equal to its starting value, this step is then repeated for the previous (longer) bitset.
This works great (and very fast) for sets. However, when used with multisets it generates duplicate solutions, since it is unaware that both elements appear more than once in the input vector. Here's some example output:
std::vector<uint32_t> input = {1, 2, 3, 4};
printAllSolutions(myCurrentAlgo(input, 2));
=> [[2, 1], [4, 3]], [[3, 1], [4, 2]], [[4, 1], [3, 2]]
std::vector<uint32_t> input = {1, 1, 2, 2};
printAllSolutions(myCurrentAlgo(input, 2));
=> [[1, 1], [2, 2]], [[2, 1], [2, 1]], [[2, 1], [2, 1]]
That last (duplicate) solution is generated simply because the algorithm is unaware of duplicates in the input, it generates the exact same internal states (i.e. which indices to select) in both examples.
I guess it's pretty clear by now what I'm trying to end up with. Just for the sake of completeness, it would look somewhat as follows:
std::vector<uint32_t> multiset = {1, 1, 2, 2};
MagicClass myGenerator(multiset, 2);
do {
std::vector<std::vector<uint32_t> > nextSolution = myGenerator.getCurrent();
std::cout << nextSolution << std::endl;
} while (myGenerator.calcNext());
=> [[1, 1], [2, 2]]
[[1, 2], [1, 2]]
I.e. the code would work somewhat like std::next_permutation
, informing that is has generated all solutions and has ended back at the "first" solution (for whatever definition of first you want to use, probably lexicographically, but doesn't need to be).
The closest related algorithm I found is Algorithm M from Knuth's The Art of Computer Programming, Volume 4 Part 1, section 7.2.1.5 (p. 430). However, that generates all possible multiset partitions. There is also an exercise in the book (7.2.1.5.69, solution on p. 778) about how to modify Alg. M in order to generate only solutions with at most r partitions. However, that still allows partitions of different sizes (e.g. [[1, 2, 2], [1]]
would be a valid output for r = 2).
Any ideas/tricks/existing algorithms on how to go about this? Note that the solution should be efficient, i.e. keeping track of all previously generated solutions, figuring out if the currently generated one is a permutation and if so skipping it, is infeasible because of the rate by which the solution space explodes for longer inputs with more duplicates.
Total count = k * S (n-1, k) + S (n-1, k-1). Create a recursive function which accepts two parameters, n and k. The function returns total number of partitions of n elements into k sets. Handle the base cases. If n = 0 or k = 0 or k > n return 0 as there cannot be any subset. If n is equal to k or k is equal to 1 return 1.
Let the index of non-one value be k. Decrease the value of p [k] by 1 and increase rem_val by 1. Now there may be two cases: a) If p [k] is more than or equal to rem_val. This is a simple case (we have the sorted order in new partition). Put rem_val at p [k+1] and p [0…k+1] is our new partition.
There are two cases. The previous n – 1 elements are divided into k partitions, i.e S (n-1, k) ways. Put this nth element into one of the previous k partitions.
We initialize p [] as n where n is the input number. In every iteration. we first print p [] and then update p [] to store the next partition. So the main problem is to get the next partition from a given partition. We are given current partition in p [] and its size. We need to update p [] to store next partition.
A recursive algorithm to distribute the elements one-by-one could be based on a few simple rules:
{A,B,D,C,C,D,B,A,C} -> {A,A,B,B,D,D,C,C,C}
{ , , } { , , } { , , }
{A, , } { , , } { , , }
^dup^
{A, , } {A, , } {A, , }
^dup^ ^dup^
partial solution: {A, , } {A, , } { , , }
^dup^
insert element B: {A,B, } {A, , } { , , }
{A, , } {A, , } {B, , }
partial solution: {A, , } {B, , } { , , }
insert another B: {A,B, } {B, , } { , , } <- ILLEGAL
{A, , } {B,B, } { , , } <- OK
{A, , } {B, , } {B, , } <- OK
partial solution: {A, , } {A, , } {B,B, }
insert first D: {A,D, } {A, , } {B,B, } <- OK
{A, , } {A, , } {B,B,D} <- ILLEGAL (NO SPACE FOR 2ND D)
partial solution: {A,A, } {B,B,D} {D, , }
insert C,C,C: {A,A,C} {B,B,D} {D,C,C}
So the algorithm would be something like this:
// PREPARATION
Sort or group input. // {A,B,D,C,C,D,B,A,C} -> {A,A,B,B,D,D,C,C,C}
Create empty partial solution. // { , , } { , , } { , , }
Start recursion with empty partial solution and index at start of input.
// RECURSION
Receive partial solution, index, group size and last-used block.
If group size is zero:
Find group size of identical elements in input, starting at index.
Set last-used block to first block.
Find empty places in partial solution, starting at last-used block.
If index is at last group in input:
Fill empty spaces with elements of last group.
Store complete solution.
Return from recursion.
Mark duplicate blocks in partial solution.
For each block in partial solution, starting at last-used block:
If current block is not a duplicate, and has empty places,
and the places left in current and later blocks is not less than the group size:
Insert element into copy of partial solution.
Recurse with copy, index + 1, group size - 1, current block.
I tested a simple JavaScript implementation of this algorithm, and it gives the correct output.
Here's my pencil and paper algorithm:
Describe the multiset in item quantities, e.g., {(1,2),(2,2)}
f(multiset,result):
if the multiset is empty:
return result
otherwise:
call f again with each unique distribution of one element added to result and
removed from the multiset state
Example:
{(1,2),(2,2),(3,2)} n = 2
11 -> 11 22 -> 11 22 33
11 2 2 -> 11 23 23
1 1 -> 12 12 -> 12 12 33
12 1 2 -> 12 13 23
Example:
{(1,2),(2,2),(3,2)} n = 3
11 -> 112 2 -> 112 233
11 22 -> 113 223
1 1 -> 122 1 -> 122 133
12 12 -> 123 123
Let's solve the problem commented below by m69 of dealing with potential duplicate distribution:
{A,B,B,C,C,D,D,D,D}
We've reached {A, , }{B, , }{B, , }, have 2 C's to distribute
and we'd like to avoid `ac bc b` generated along with `ac b bc`.
Because our generation in the level just above is ordered, the series of identical
counts will be continuous. When a series of identical counts is encountered, make
the assignment for the whole block of identical counts (rather than each one),
and partition that contribution in descending parts; for example,
| identical |
ac b b
ac bc b // descending parts [1,0]
Example of longer block:
| identical block | descending parts
ac bcccc b b b // [4,0,0,0]
ac bccc bc b b // [3,1,0,0]
ac bcc bcc b b // [2,2,0,0]
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With