Algorithm to generate all multiset size-n partitions

Tags:

algorithm

I've been trying to figure out a way to generate all distinct size-n partitions of a multiset, but so far have come up empty handed. First let me show what I'm trying to archieve.

Let's say we have an input vector of uint32_t:

Click to copy

std::vector<uint32_t> input = {1, 1, 2, 2}

An let's say we want to create all distinct 2-size partitions. There's only two of these, namely:

Click to copy

[[1, 1], [2, 2]], [[1, 2], [1, 2]]

Note that order does not matter, i.e. all of the following are duplicate, incorrect solutions.

Duplicate because order within a permutation group does not matter:

Click to copy
```
[[2, 1], [1, 2]]
```
Duplicate because order of groups does not matter:

Click to copy
```
[[2, 2], [1, 1]]
```

Not homework of some kind BTW. I encountered this while coding something at work, but by now it is out of personal interest that I'd like to know how to deal with this. The parameters for the work-related problem were small enough that generating a couple thousand duplicate solutions didn't really matter.

Current solution (generates duplicates)

In order to illustrate that I'm not just asking without having tried to come up with a solution, let me try to explain my current algorithm (which generates duplicate solutions when used with multisets).

It works as follows: the state has a bitset with n bits set to 1 for each partition block. The length of the bitsets is size(input) - n * index_block(), e.g. if the input vector has 8 elements and n = 2, then the first partition block uses an 8-bit bitset with 2 bits set to 1, the next partition block uses a 6-bit bitset with 2 bits set to 1, etc.

A partition is created from these bitsets by iterating over each bitset in order and extracting the elements of the input vector with indices equal to the position of 1-bits in the current bitset.

In order to generate the next partition, I iterate over the bitsets in reverse order. The next bitset permutation is calculated (using a reverse of Gosper's hack). If the first bit in the current bitset is not set (i.e. vector index 0 not selected), then that bitset is reset to its starting state. Enforcing that the first bit is always set prevents generating duplicates when creating size-n set partitions (duplicates of the 2nd kind shown above). If the current bitset is equal to its starting value, this step is then repeated for the previous (longer) bitset.

This works great (and very fast) for sets. However, when used with multisets it generates duplicate solutions, since it is unaware that both elements appear more than once in the input vector. Here's some example output:

Click to copy

std::vector<uint32_t> input = {1, 2, 3, 4};
printAllSolutions(myCurrentAlgo(input, 2));
=> [[2, 1], [4, 3]], [[3, 1], [4, 2]], [[4, 1], [3, 2]]

std::vector<uint32_t> input = {1, 1, 2, 2};
printAllSolutions(myCurrentAlgo(input, 2));
=> [[1, 1], [2, 2]], [[2, 1], [2, 1]], [[2, 1], [2, 1]]

That last (duplicate) solution is generated simply because the algorithm is unaware of duplicates in the input, it generates the exact same internal states (i.e. which indices to select) in both examples.

Wanted solution

I guess it's pretty clear by now what I'm trying to end up with. Just for the sake of completeness, it would look somewhat as follows:

Click to copy

std::vector<uint32_t> multiset = {1, 1, 2, 2};
MagicClass myGenerator(multiset, 2);
do {
  std::vector<std::vector<uint32_t> > nextSolution = myGenerator.getCurrent();
  std::cout << nextSolution << std::endl;
} while (myGenerator.calcNext());
=> [[1, 1], [2, 2]]
   [[1, 2], [1, 2]]

I.e. the code would work somewhat like std::next_permutation, informing that is has generated all solutions and has ended back at the "first" solution (for whatever definition of first you want to use, probably lexicographically, but doesn't need to be).

The closest related algorithm I found is Algorithm M from Knuth's The Art of Computer Programming, Volume 4 Part 1, section 7.2.1.5 (p. 430). However, that generates all possible multiset partitions. There is also an exercise in the book (7.2.1.5.69, solution on p. 778) about how to modify Alg. M in order to generate only solutions with at most r partitions. However, that still allows partitions of different sizes (e.g. [[1, 2, 2], [1]] would be a valid output for r = 2).

Any ideas/tricks/existing algorithms on how to go about this? Note that the solution should be efficient, i.e. keeping track of all previously generated solutions, figuring out if the currently generated one is a permutation and if so skipping it, is infeasible because of the rate by which the solution space explodes for longer inputs with more duplicates.

570

asked May 27 '16 12:05

AVH

2 Answers

A recursive algorithm to distribute the elements one-by-one could be based on a few simple rules:

Start by sorting or counting the different elements; they don't have to be in any particular order, you just want to group identical elements together. (This step will simplify some of the following steps, but could be skipped.)

Click to copy

   {A,B,D,C,C,D,B,A,C} -> {A,A,B,B,D,D,C,C,C}

Start with an empty solution, and insert the elements one by one, using the following rules:

Click to copy

   { , , } { , , } { , , }

Before inserting an element, find the duplicate blocks, e.g.:

Click to copy

   {A, , } { , , } { , , }  
                    ^dup^

   {A, , } {A, , } {A, , }  
            ^dup^   ^dup^

Insert the element into every non-duplicate block with available space:

Click to copy

   partial solution: {A, , } {A, , } { , , }  
                              ^dup^

   insert element B: {A,B, } {A, , } { , , }  
                     {A, , } {A, , } {B, , }

If an identical element is already present, don't put the new element before it:

Click to copy

   partial solution:  {A, , } {B, , } { , , }  
   insert another B:  {A,B, } {B, , } { , , }  <- ILLEGAL  
                      {A, , } {B,B, } { , , }  <- OK
                      {A, , } {B, , } {B, , }  <- OK

When inserting an element of which there are another N identical elements, make sure to leave N open spots after the current element:

Click to copy

   partial solution:  {A, , } {A, , } {B,B, }  
   insert first D:    {A,D, } {A, , } {B,B, }  <- OK  
                      {A, , } {A, , } {B,B,D}  <- ILLEGAL (NO SPACE FOR 2ND D)

The last group of identical elements can be inserted in one go:

Click to copy

   partial solution:  {A,A, } {B,B,D} {D, , }  
   insert C,C,C:      {A,A,C} {B,B,D} {D,C,C}

So the algorithm would be something like this:

Click to copy

// PREPARATION  
Sort or group input.              // {A,B,D,C,C,D,B,A,C} -> {A,A,B,B,D,D,C,C,C}  
Create empty partial solution.    // { , , } { , , } { , , }  
Start recursion with empty partial solution and index at start of input.  

// RECURSION  
Receive partial solution, index, group size and last-used block.  
If group size is zero:  
    Find group size of identical elements in input, starting at index.  
    Set last-used block to first block.  
Find empty places in partial solution, starting at last-used block.  
If index is at last group in input:  
    Fill empty spaces with elements of last group.
    Store complete solution.
    Return from recursion.
Mark duplicate blocks in partial solution.  
For each block in partial solution, starting at last-used block:  
    If current block is not a duplicate, and has empty places,  
    and the places left in current and later blocks is not less than the group size:
        Insert element into copy of partial solution.
        Recurse with copy, index + 1, group size - 1, current block.

I tested a simple JavaScript implementation of this algorithm, and it gives the correct output.

134

answered Oct 23 '22 04:10

m69 says ''Пу́тин хуйло́''

Here's my pencil and paper algorithm:

Click to copy

Describe the multiset in item quantities, e.g., {(1,2),(2,2)}

f(multiset,result):
  if the multiset is empty:
    return result
  otherwise:
    call f again with each unique distribution of one element added to result and 
    removed from the multiset state


Example:
{(1,2),(2,2),(3,2)} n = 2

11       -> 11 22    -> 11 22 33
            11 2  2  -> 11 23 23
1  1     -> 12 12    -> 12 12 33
            12 1  2  -> 12 13 23


Example:
{(1,2),(2,2),(3,2)} n = 3

11      -> 112 2   -> 112 233
           11  22  -> 113 223
1   1   -> 122 1   -> 122 133
           12  12  -> 123 123

Let's solve the problem commented below by m69 of dealing with potential duplicate distribution:

Click to copy

{A,B,B,C,C,D,D,D,D}

We've reached {A, , }{B, , }{B, , }, have 2 C's to distribute
and we'd like to avoid `ac  bc  b` generated along with `ac  b   bc`.

Because our generation in the level just above is ordered, the series of identical 
counts will be continuous. When a series of identical counts is encountered, make 
the assignment for the whole block of identical counts (rather than each one), 
and partition that contribution in descending parts; for example,

      | identical |
ac     b      b
ac     bc     b     // descending parts [1,0]

Example of longer block:

      |    identical block     |  descending parts
ac     bcccc  b      b      b    // [4,0,0,0] 
ac     bccc   bc     b      b    // [3,1,0,0]
ac     bcc    bcc    b      b    // [2,2,0,0]
...

answered Oct 23 '22 03:10

גלעד ברקן

Related questions
                            
                                Longest positive sum substring
                            
                                Stochastic Hill Climbing
                            
                                Disadvantage of Using Linked Lists in Memory Management
                            
                                Representing Travelling Salesman as Linear Expression
                            
                                Why this code is getting slower when I use std::algorithms instead of plain loops?
                            
                                Reduce a string using grammar-like rules
                            
                                Give the k most frequent IP addresses from the large stream of IP address in constant time and constant space
                            
                                Guidance on Algorithmic Thinking (4 fours equation)
                            
                                why does floyd warshall just use one distance matrix?
                            
                                Is there a polynomial algorithm to find the max weighted perfect matching in a general graph?
                            
                                Optimal way to compute permutations in julia
                            
                                TicTacToe minimax algorithm returns unexpected results in 4x4 games
                            
                                Structure to run x first vs y first on 2d array
                            
                                How to encrypt a file in .net and decrypt in android
                            
                                Generate next lexicographical string in Haskell
                            
                                Find all possible distances from two arrays
                            
                                Select items from list
                            
                                Most efficient way to insert an element into sorted array and find its index
                            
                                Expected number of assignments to find maximum value in an array
                            
                                Find the sum of Fibonacci Series

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Algorithm to generate all multiset size-n partitions

Tags:

language-agnostic

algorithm

Current solution (generates duplicates)

Wanted solution

AVH

People also ask

2 Answers

m69 says ''Пу́тин хуйло́''

גלעד ברקן

Recent Activity

Donate For Us