Given a set with n elements, I need to find all the partitions of that set where there are k subsets of almost equal size. For example, for a set with 7 elements and 3 subsets I only want partitions where there are two subsets with 2 elements each and one subset with 3 elements. I do not want a partition that has subsets with 1, 2, and 4 elements. Put another way, there are 877 possible partitions for a set of 7 elements, but I'm only interested in the 105 (?) partitions that are made up subsets with 2/2/3 elements: <img src="https://i.stack.imgur.com/3JRzP.png" alt="Graphical representation of the partitions of a 7-element set where subsets have 2, 2, and 3 elements each."> In reality n is around 35, which means that there are approximately 2.81 * 1027 partitions, "only" 8,338,573,669,964,101 partitions with three subsets. As such, I can't possibly calculate them all and wade through to find the ones that I want. What's an algorithm for calculating just the partitions of interest? (Not the number of partitions, but that actual members in each sub-set for each partition.)

Important observation: Under the condition that the partition sizes must not differ by more than 1, the multiset of partition sizes is uniquely defined. We have <code>(n % k)</code> parts of size <code>ceil(n / k)</code> and <code>(k - n % k)</code> partitions of size <code>floor(n / k)</code>. So let's fix the partition sizes and just enumerate all possible subsets for all partitions: <pre class="prettyprint"><code>@n = n = 7 @k = k = 3 @used = n.times.map { false } # @sizes = [2,2,3] for n = 7, k = 3 @sizes = (k - n % k).times.map { n / k } + (n % k).times.map { (n + k - 1) / k } @parts = @sizes.map { |i| [0]*i } def enumerate_part(i, j, lo, y) # i = index of the current part # j = index of the current number in the current part # lo = lower bound for numbers to chose # y = output yielder if i == @parts.size y << @parts.map(&:dup) return end if j == @sizes[i] enumerate_part(i + 1, 0, @sizes[i + 1] == @sizes[i] ? @parts[i][0] : 1, y) return end (lo..@n).each do |x| next if @used[x] @used[x] = true @parts[i][j] = x enumerate_part(i, j + 1, x + 1, y) @used[x] = false end end results = Enumerator.new { |y| enumerate_part(0,0,1,y) } puts results.count </code></pre> Note that for the partitions of equal size, I assign them increasing minima to achieve uniqueness. This can cause a bit of backtracking (at most 2 levels), so it's not optimal. We might want to avoid assigning numbers where we already know that the lower bound will be too high for the next, equally sized partition(s). It gets a bit complex at that point, so I kept it simple. It is definitely possible to improve this. It uses only O(n) space because I mutate global state instead of copying stuff around (yes, I yield a defensive copy to the enumerator, but that is not necessary if you process the results immediately in every iteration). Maybe not the most elegant solution, but the goal is to achieve good speed. Output: <pre class="prettyprint"><code>105 </code></pre> It is no fun running this with larger n (beyond 20). I blame Ruby for that, but fortunately this particular implementation will look very similar in almost any other language. UPDATE: Just for fun, I improved the algorithm to be completely linear in the output size (apart from the potential n factor due to maintaining the set of leftover numbers). It should be several times faster that way: <pre class="prettyprint"><code>def dfs(i, j, lo, eq, sz, state) parts, y, n, k, left = state if i == k y << parts return end if j == sz mid = i+1 == n % k neweq, newsz = mid ? [k - i - 1, sz-1] : [eq - 1, sz] dfs(i+1, 0, mid ? 1 : parts[i][0], neweq, newsz, state) return end higher = ((j == 0) ? (eq * sz - j - 1) : (sz - j - 1)) left.drop(higher).each_with_index do |x,idx| break if x < lo parts[i][j] = x left.delete_at(idx + higher) dfs(i, j + 1, x + 1, eq, sz, state) left.insert(idx + higher, x) end end def niklas(n, k) parts = k.times.map{|i| [0]*(i < n%k ? (n+k-1)/k : n/k) } left = n.downto(1).to_a results = Enumerator.new { |y| dfs(0, 0, 1, (n % k == 0) ? k : n % k, (n + k - 1) / k, [parts, y, n, k, left]) } cnt = results.count puts "count = #{cnt} != #{@sz}" unless cnt == @sz end </code></pre>

Calculate set partitions with specific subset sizes

Tags:

algorithm

ruby

set

combinatorics

combinations

Given a set with n elements, I need to find all the partitions of that set where there are k subsets of almost equal size.

For example, for a set with 7 elements and 3 subsets I only want partitions where there are two subsets with 2 elements each and one subset with 3 elements. I do not want a partition that has subsets with 1, 2, and 4 elements.

Put another way, there are 877 possible partitions for a set of 7 elements, but I'm only interested in the 105 (?) partitions that are made up subsets with 2/2/3 elements:

Graphical representation of the partitions of a 7-element set where subsets have 2, 2, and 3 elements each.

In reality n is around 35, which means that there are approximately 2.81 * 10²⁷ partitions, "only" 8,338,573,669,964,101 partitions with three subsets. As such, I can't possibly calculate them all and wade through to find the ones that I want.

What's an algorithm for calculating just the partitions of interest? (Not the number of partitions, but that actual members in each sub-set for each partition.)

604

asked Mar 19 '14 22:03

Phrogz

2 Answers

Here's a good way to do it, generating all possibilities only once by taking care of keeping them sorted, as well as a quick way to calculate lazily the number of answers:

def enum(n, k)
  # Pick smaller_size items from the list, repeat smaller_n times
  # then pick larger_size items from the list, repeat larger_n times.
  smaller_n = n.div k
  larger_times = n % k
  smaller_times = k - larger_times
  larger_n = smaller_n + 1

  return to_enum(:enum, n, k) { calc_size(n, smaller_n, smaller_times, larger_n, larger_times) } unless block_given?

  all = [*1..n]
  # split all into one subset to group with the smaller_n and another
  # to group with the larger_n.
  all.combination(smaller_n * smaller_times).each do |smaller|
    larger = all - smaller
    subdivide(smaller, smaller_n) do |small|
      subdivide(larger, larger_n) do |large|
        yield [*small, *large]
      end
    end
  end
end

# Subdivides elems into groups of n, keeping the elements sorted
# and generating only the sorted such combinations.
def subdivide(elems, n)
  return yield [] if elems.empty?
  # No choice for the first element, because we want to keep things sorted.
  first, *rest = elems
  rest.combination(n - 1).each do |comb|
    remain = rest - comb
    subdivide(remain, n) do |sub|
      yield [[first, *comb], *sub]
    end
  end
end

def calc_size(n, smaller_n, smaller_times, larger_n, larger_times)
  all = [
    smaller_times.times.map do |i|
      Array.new(n - i*smaller_n).combination(smaller_n)
    end,
    larger_times.times.map do |i|
      Array.new(n - smaller_times*smaller_n - i*larger_n).combination(larger_n)
    end
  ]
  # Multiply everything, but divide by the number of symmetries, because
  # we don't want to distinguish (1,2), (3,4), ... from (3,4), (1,2), ...
  all.map do |enums|
    enums.map(&:size).inject(1, :*) / enums.permutation.size
  end.inject(:*)
end

p enum(7, 3).size      # => 105 (instant)
p enum(7, 3).first(5)  # => [[[1, 2], [3, 4], [5, 6, 7]],
                       #     [[1, 3], [2, 4], [5, 6, 7]],
                       #     [[1, 4], [2, 3], [5, 6, 7]],
                       #     [[1, 2], [3, 5], [4, 6, 7]],
                       #     [[1, 3], [2, 5], [4, 6, 7]]]
p enum(7, 3).count     # => 105 (quick)
p enum(35, 3).size     # => 564121960420200 (instant)
p enum(35, 3).first(2) # => [[[1..11], [12..23], [24..35]], 
                       #     [[1..11], [12..22, 24], [23, 25..35]]]
p enum(35, 3).count    # => will take forever, should return 564121960420200

Note: Just for kicks, this can also calculate the size lazily by building some enumerators and using size, without iterating through them. This will work in Ruby 2.0+ only though, as it require Enumerator#size.

For added fun:

require 'with_progress'
enum(16, 3).with_progress.count # => enjoy!

119

answered Sep 27 '22 02:09

Marc-André Lafortune

Important observation: Under the condition that the partition sizes must not differ by more than 1, the multiset of partition sizes is uniquely defined. We have (n % k) parts of size ceil(n / k) and (k - n % k) partitions of size floor(n / k).

So let's fix the partition sizes and just enumerate all possible subsets for all partitions:

@n = n = 7
@k = k = 3
@used = n.times.map { false }
#  @sizes = [2,2,3]  for n = 7, k = 3
@sizes = (k - n % k).times.map { n / k } + (n % k).times.map { (n + k - 1) / k }
@parts = @sizes.map { |i| [0]*i }
def enumerate_part(i, j, lo, y)
  # i = index of the current part
  # j = index of the current number in the current part
  # lo = lower bound for numbers to chose
  # y = output yielder
  if i == @parts.size
    y << @parts.map(&:dup)
    return
  end
  if j == @sizes[i]
    enumerate_part(i + 1, 0, @sizes[i + 1] == @sizes[i] ? @parts[i][0] : 1, y)
    return
  end
  (lo..@n).each do |x|
    next if @used[x]
    @used[x] = true
    @parts[i][j] = x
    enumerate_part(i, j + 1, x + 1, y)
    @used[x] = false
  end
end

results = Enumerator.new { |y| enumerate_part(0,0,1,y) }
puts results.count

Note that for the partitions of equal size, I assign them increasing minima to achieve uniqueness. This can cause a bit of backtracking (at most 2 levels), so it's not optimal. We might want to avoid assigning numbers where we already know that the lower bound will be too high for the next, equally sized partition(s). It gets a bit complex at that point, so I kept it simple. It is definitely possible to improve this.

It uses only O(n) space because I mutate global state instead of copying stuff around (yes, I yield a defensive copy to the enumerator, but that is not necessary if you process the results immediately in every iteration). Maybe not the most elegant solution, but the goal is to achieve good speed.

Output:

It is no fun running this with larger n (beyond 20). I blame Ruby for that, but fortunately this particular implementation will look very similar in almost any other language.

UPDATE: Just for fun, I improved the algorithm to be completely linear in the output size (apart from the potential n factor due to maintaining the set of leftover numbers). It should be several times faster that way:

def dfs(i, j, lo, eq, sz, state)
  parts, y, n, k, left = state
  if i == k
    y << parts
    return
  end
  if j == sz
    mid = i+1 == n % k
    neweq, newsz = mid ? [k - i - 1, sz-1] : [eq - 1, sz]
    dfs(i+1, 0, mid ? 1 : parts[i][0], neweq, newsz, state)
    return
  end
  higher = ((j == 0) ? (eq * sz - j - 1) : (sz - j - 1))
  left.drop(higher).each_with_index do |x,idx|
    break if x < lo
    parts[i][j] = x
    left.delete_at(idx + higher)
    dfs(i, j + 1, x + 1, eq, sz, state)
    left.insert(idx + higher, x)
  end
end

def niklas(n, k)
  parts = k.times.map{|i| [0]*(i < n%k ? (n+k-1)/k : n/k) }
  left = n.downto(1).to_a
  results = Enumerator.new { |y|
    dfs(0, 0, 1, (n % k == 0) ? k : n % k, (n + k - 1) / k, [parts, y, n, k, left])
  }
  cnt = results.count
  puts "count = #{cnt} != #{@sz}" unless cnt == @sz
end

answered Sep 24 '22 02:09

Niklas B.

Related questions
                            
                                Why does Rake task enhancement differ between my local environment and when deploying to Heroku Cedar?
                            
                                Need help understanding ruby's Process.detach
                            
                                Importing Existing Rails Application to RubyMine Project
                            
                                default_scope in Sequel
                            
                                Models with reserved keywords
                            
                                How do I use Ruby metaprogramming to add callbacks to a Rails model?
                            
                                Why does a range with invalid arguments sometimes not cause an argument error?
                            
                                "Closed connection error" while trying to connect Ruby to SQL server
                            
                                Calculate statistics on arrays
                            
                                Implementing a synchronization barrier in Ruby
                            
                                Why is no Ruby script found in input (LoadError)?
                            
                                Openssl error message on install log when installing ruby 1.9.3 on Mac OSX 10.6.8
                            
                                embedded document vs hash datatype in mongoid
                            
                                How to set environment variable using Chef?
                            
                                How to iterate through a yaml hash structure in ruby?
                            
                                Do Watir-Webdriver and Capybara have performance issues when compared to Webdriver?
                            
                                remote code execution in ruby with constantize
                            
                                Rails 4 API with Strong Parameters?
                            
                                Store timestamps with timezone in rails 3.2
                            
                                (Watir webdriver) How to move Browser to another screen when using multiple display in OSX 10.9

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With