Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to generate all multiset size-n partitions

I've been trying to figure out a way to generate all distinct size-n partitions of a multiset, but so far have come up empty handed. First let me show what I'm trying to archieve.

Let's say we have an input vector of uint32_t:

std::vector<uint32_t> input = {1, 1, 2, 2}

An let's say we want to create all distinct 2-size partitions. There's only two of these, namely:

[[1, 1], [2, 2]], [[1, 2], [1, 2]]

Note that order does not matter, i.e. all of the following are duplicate, incorrect solutions.

  • Duplicate because order within a permutation group does not matter:

    [[2, 1], [1, 2]]
    
  • Duplicate because order of groups does not matter:

    [[2, 2], [1, 1]]
    

Not homework of some kind BTW. I encountered this while coding something at work, but by now it is out of personal interest that I'd like to know how to deal with this. The parameters for the work-related problem were small enough that generating a couple thousand duplicate solutions didn't really matter.

Current solution (generates duplicates)

In order to illustrate that I'm not just asking without having tried to come up with a solution, let me try to explain my current algorithm (which generates duplicate solutions when used with multisets).

It works as follows: the state has a bitset with n bits set to 1 for each partition block. The length of the bitsets is size(input) - n * index_block(), e.g. if the input vector has 8 elements and n = 2, then the first partition block uses an 8-bit bitset with 2 bits set to 1, the next partition block uses a 6-bit bitset with 2 bits set to 1, etc.

A partition is created from these bitsets by iterating over each bitset in order and extracting the elements of the input vector with indices equal to the position of 1-bits in the current bitset.

In order to generate the next partition, I iterate over the bitsets in reverse order. The next bitset permutation is calculated (using a reverse of Gosper's hack). If the first bit in the current bitset is not set (i.e. vector index 0 not selected), then that bitset is reset to its starting state. Enforcing that the first bit is always set prevents generating duplicates when creating size-n set partitions (duplicates of the 2nd kind shown above). If the current bitset is equal to its starting value, this step is then repeated for the previous (longer) bitset.

This works great (and very fast) for sets. However, when used with multisets it generates duplicate solutions, since it is unaware that both elements appear more than once in the input vector. Here's some example output:

std::vector<uint32_t> input = {1, 2, 3, 4};
printAllSolutions(myCurrentAlgo(input, 2));
=> [[2, 1], [4, 3]], [[3, 1], [4, 2]], [[4, 1], [3, 2]]

std::vector<uint32_t> input = {1, 1, 2, 2};
printAllSolutions(myCurrentAlgo(input, 2));
=> [[1, 1], [2, 2]], [[2, 1], [2, 1]], [[2, 1], [2, 1]]

That last (duplicate) solution is generated simply because the algorithm is unaware of duplicates in the input, it generates the exact same internal states (i.e. which indices to select) in both examples.

Wanted solution

I guess it's pretty clear by now what I'm trying to end up with. Just for the sake of completeness, it would look somewhat as follows:

std::vector<uint32_t> multiset = {1, 1, 2, 2};
MagicClass myGenerator(multiset, 2);
do {
  std::vector<std::vector<uint32_t> > nextSolution = myGenerator.getCurrent();
  std::cout << nextSolution << std::endl;
} while (myGenerator.calcNext());
=> [[1, 1], [2, 2]]
   [[1, 2], [1, 2]]

I.e. the code would work somewhat like std::next_permutation, informing that is has generated all solutions and has ended back at the "first" solution (for whatever definition of first you want to use, probably lexicographically, but doesn't need to be).

The closest related algorithm I found is Algorithm M from Knuth's The Art of Computer Programming, Volume 4 Part 1, section 7.2.1.5 (p. 430). However, that generates all possible multiset partitions. There is also an exercise in the book (7.2.1.5.69, solution on p. 778) about how to modify Alg. M in order to generate only solutions with at most r partitions. However, that still allows partitions of different sizes (e.g. [[1, 2, 2], [1]] would be a valid output for r = 2).

Any ideas/tricks/existing algorithms on how to go about this? Note that the solution should be efficient, i.e. keeping track of all previously generated solutions, figuring out if the currently generated one is a permutation and if so skipping it, is infeasible because of the rate by which the solution space explodes for longer inputs with more duplicates.

like image 570
AVH Avatar asked May 27 '16 12:05

AVH


People also ask

How to calculate the total number of partitions of n elements?

Total count = k * S (n-1, k) + S (n-1, k-1). Create a recursive function which accepts two parameters, n and k. The function returns total number of partitions of n elements into k sets. Handle the base cases. If n = 0 or k = 0 or k > n return 0 as there cannot be any subset. If n is equal to k or k is equal to 1 return 1.

How do you partition a non-one value?

Let the index of non-one value be k. Decrease the value of p [k] by 1 and increase rem_val by 1. Now there may be two cases: a) If p [k] is more than or equal to rem_val. This is a simple case (we have the sorted order in new partition). Put rem_val at p [k+1] and p [0…k+1] is our new partition.

How do you divide a nth element into k partitions?

There are two cases. The previous n – 1 elements are divided into k partitions, i.e S (n-1, k) ways. Put this nth element into one of the previous k partitions.

How to get the next partition from a given partition?

We initialize p [] as n where n is the input number. In every iteration. we first print p [] and then update p [] to store the next partition. So the main problem is to get the next partition from a given partition. We are given current partition in p [] and its size. We need to update p [] to store next partition.


2 Answers

A recursive algorithm to distribute the elements one-by-one could be based on a few simple rules:

  • Start by sorting or counting the different elements; they don't have to be in any particular order, you just want to group identical elements together. (This step will simplify some of the following steps, but could be skipped.)
   {A,B,D,C,C,D,B,A,C} -> {A,A,B,B,D,D,C,C,C}  
  • Start with an empty solution, and insert the elements one by one, using the following rules:
   { , , } { , , } { , , }  
  • Before inserting an element, find the duplicate blocks, e.g.:
   {A, , } { , , } { , , }  
                    ^dup^

   {A, , } {A, , } {A, , }  
            ^dup^   ^dup^
  • Insert the element into every non-duplicate block with available space:
   partial solution: {A, , } {A, , } { , , }  
                              ^dup^

   insert element B: {A,B, } {A, , } { , , }  
                     {A, , } {A, , } {B, , }  
  • If an identical element is already present, don't put the new element before it:
   partial solution:  {A, , } {B, , } { , , }  
   insert another B:  {A,B, } {B, , } { , , }  <- ILLEGAL  
                      {A, , } {B,B, } { , , }  <- OK
                      {A, , } {B, , } {B, , }  <- OK
  • When inserting an element of which there are another N identical elements, make sure to leave N open spots after the current element:
   partial solution:  {A, , } {A, , } {B,B, }  
   insert first D:    {A,D, } {A, , } {B,B, }  <- OK  
                      {A, , } {A, , } {B,B,D}  <- ILLEGAL (NO SPACE FOR 2ND D)  
  • The last group of identical elements can be inserted in one go:
   partial solution:  {A,A, } {B,B,D} {D, , }  
   insert C,C,C:      {A,A,C} {B,B,D} {D,C,C}  

So the algorithm would be something like this:

// PREPARATION  
Sort or group input.              // {A,B,D,C,C,D,B,A,C} -> {A,A,B,B,D,D,C,C,C}  
Create empty partial solution.    // { , , } { , , } { , , }  
Start recursion with empty partial solution and index at start of input.  

// RECURSION  
Receive partial solution, index, group size and last-used block.  
If group size is zero:  
    Find group size of identical elements in input, starting at index.  
    Set last-used block to first block.  
Find empty places in partial solution, starting at last-used block.  
If index is at last group in input:  
    Fill empty spaces with elements of last group.
    Store complete solution.
    Return from recursion.
Mark duplicate blocks in partial solution.  
For each block in partial solution, starting at last-used block:  
    If current block is not a duplicate, and has empty places,  
    and the places left in current and later blocks is not less than the group size:
        Insert element into copy of partial solution.
        Recurse with copy, index + 1, group size - 1, current block.

I tested a simple JavaScript implementation of this algorithm, and it gives the correct output.

like image 134

Here's my pencil and paper algorithm:

Describe the multiset in item quantities, e.g., {(1,2),(2,2)}

f(multiset,result):
  if the multiset is empty:
    return result
  otherwise:
    call f again with each unique distribution of one element added to result and 
    removed from the multiset state


Example:
{(1,2),(2,2),(3,2)} n = 2

11       -> 11 22    -> 11 22 33
            11 2  2  -> 11 23 23
1  1     -> 12 12    -> 12 12 33
            12 1  2  -> 12 13 23


Example:
{(1,2),(2,2),(3,2)} n = 3

11      -> 112 2   -> 112 233
           11  22  -> 113 223
1   1   -> 122 1   -> 122 133
           12  12  -> 123 123

Let's solve the problem commented below by m69 of dealing with potential duplicate distribution:

{A,B,B,C,C,D,D,D,D}

We've reached {A, , }{B, , }{B, , }, have 2 C's to distribute
and we'd like to avoid `ac  bc  b` generated along with `ac  b   bc`.

Because our generation in the level just above is ordered, the series of identical 
counts will be continuous. When a series of identical counts is encountered, make 
the assignment for the whole block of identical counts (rather than each one), 
and partition that contribution in descending parts; for example,

      | identical |
ac     b      b
ac     bc     b     // descending parts [1,0]

Example of longer block:

      |    identical block     |  descending parts
ac     bcccc  b      b      b    // [4,0,0,0] 
ac     bccc   bc     b      b    // [3,1,0,0]
ac     bcc    bcc    b      b    // [2,2,0,0]
...
like image 34
גלעד ברקן Avatar answered Oct 23 '22 03:10

גלעד ברקן