Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate set partitions with specific subset sizes

Given a set with n elements, I need to find all the partitions of that set where there are k subsets of almost equal size.

For example, for a set with 7 elements and 3 subsets I only want partitions where there are two subsets with 2 elements each and one subset with 3 elements. I do not want a partition that has subsets with 1, 2, and 4 elements.

Put another way, there are 877 possible partitions for a set of 7 elements, but I'm only interested in the 105 (?) partitions that are made up subsets with 2/2/3 elements:

                                Graphical representation of the partitions of a 7-element set where subsets have 2, 2, and 3 elements each.

In reality n is around 35, which means that there are approximately 2.81 * 1027 partitions, "only" 8,338,573,669,964,101 partitions with three subsets. As such, I can't possibly calculate them all and wade through to find the ones that I want.

What's an algorithm for calculating just the partitions of interest? (Not the number of partitions, but that actual members in each sub-set for each partition.)

like image 604
Phrogz Avatar asked Mar 19 '14 22:03

Phrogz


People also ask

How do you determine how many partitions a set has?

If S(n, k) is the number of onto functions from an n-element set onto a k-element set, then S(n, k)/k! gives the number of partitions of an n-element set into k nonempty subsets. Hence, by the sum rule, dn = S(n,1)/1! + S(n,2)/2!

How many partitions are in a set of size 5?

The 52 partitions of a set with 5 elements. A colored region indicates a subset of X that forms a member of the enclosing partition. Uncolored dots indicate single-element subsets.

How many different subsets of size k does a set of size n have?

n! k!( n−k)! . In other words, the number of subsets of size k of an n-set is (n k ) . Page 12 Property 3 says that binomial coefficients can be calculated recursively, using Pas- cal's triangle, where each entry is the sum of the two adjacent ones in the up- per row.


2 Answers

Here's a good way to do it, generating all possibilities only once by taking care of keeping them sorted, as well as a quick way to calculate lazily the number of answers:

def enum(n, k)
  # Pick smaller_size items from the list, repeat smaller_n times
  # then pick larger_size items from the list, repeat larger_n times.
  smaller_n = n.div k
  larger_times = n % k
  smaller_times = k - larger_times
  larger_n = smaller_n + 1

  return to_enum(:enum, n, k) { calc_size(n, smaller_n, smaller_times, larger_n, larger_times) } unless block_given?

  all = [*1..n]
  # split all into one subset to group with the smaller_n and another
  # to group with the larger_n.
  all.combination(smaller_n * smaller_times).each do |smaller|
    larger = all - smaller
    subdivide(smaller, smaller_n) do |small|
      subdivide(larger, larger_n) do |large|
        yield [*small, *large]
      end
    end
  end
end

# Subdivides elems into groups of n, keeping the elements sorted
# and generating only the sorted such combinations.
def subdivide(elems, n)
  return yield [] if elems.empty?
  # No choice for the first element, because we want to keep things sorted.
  first, *rest = elems
  rest.combination(n - 1).each do |comb|
    remain = rest - comb
    subdivide(remain, n) do |sub|
      yield [[first, *comb], *sub]
    end
  end
end

def calc_size(n, smaller_n, smaller_times, larger_n, larger_times)
  all = [
    smaller_times.times.map do |i|
      Array.new(n - i*smaller_n).combination(smaller_n)
    end,
    larger_times.times.map do |i|
      Array.new(n - smaller_times*smaller_n - i*larger_n).combination(larger_n)
    end
  ]
  # Multiply everything, but divide by the number of symmetries, because
  # we don't want to distinguish (1,2), (3,4), ... from (3,4), (1,2), ...
  all.map do |enums|
    enums.map(&:size).inject(1, :*) / enums.permutation.size
  end.inject(:*)
end

p enum(7, 3).size      # => 105 (instant)
p enum(7, 3).first(5)  # => [[[1, 2], [3, 4], [5, 6, 7]],
                       #     [[1, 3], [2, 4], [5, 6, 7]],
                       #     [[1, 4], [2, 3], [5, 6, 7]],
                       #     [[1, 2], [3, 5], [4, 6, 7]],
                       #     [[1, 3], [2, 5], [4, 6, 7]]]
p enum(7, 3).count     # => 105 (quick)
p enum(35, 3).size     # => 564121960420200 (instant)
p enum(35, 3).first(2) # => [[[1..11], [12..23], [24..35]], 
                       #     [[1..11], [12..22, 24], [23, 25..35]]]
p enum(35, 3).count    # => will take forever, should return 564121960420200

Note: Just for kicks, this can also calculate the size lazily by building some enumerators and using size, without iterating through them. This will work in Ruby 2.0+ only though, as it require Enumerator#size.

For added fun:

require 'with_progress'
enum(16, 3).with_progress.count # => enjoy!
like image 119
Marc-André Lafortune Avatar answered Sep 27 '22 02:09

Marc-André Lafortune


Important observation: Under the condition that the partition sizes must not differ by more than 1, the multiset of partition sizes is uniquely defined. We have (n % k) parts of size ceil(n / k) and (k - n % k) partitions of size floor(n / k).

So let's fix the partition sizes and just enumerate all possible subsets for all partitions:

@n = n = 7
@k = k = 3
@used = n.times.map { false }
#  @sizes = [2,2,3]  for n = 7, k = 3
@sizes = (k - n % k).times.map { n / k } + (n % k).times.map { (n + k - 1) / k }
@parts = @sizes.map { |i| [0]*i }
def enumerate_part(i, j, lo, y)
  # i = index of the current part
  # j = index of the current number in the current part
  # lo = lower bound for numbers to chose
  # y = output yielder
  if i == @parts.size
    y << @parts.map(&:dup)
    return
  end
  if j == @sizes[i]
    enumerate_part(i + 1, 0, @sizes[i + 1] == @sizes[i] ? @parts[i][0] : 1, y)
    return
  end
  (lo..@n).each do |x|
    next if @used[x]
    @used[x] = true
    @parts[i][j] = x
    enumerate_part(i, j + 1, x + 1, y)
    @used[x] = false
  end
end

results = Enumerator.new { |y| enumerate_part(0,0,1,y) }
puts results.count

Note that for the partitions of equal size, I assign them increasing minima to achieve uniqueness. This can cause a bit of backtracking (at most 2 levels), so it's not optimal. We might want to avoid assigning numbers where we already know that the lower bound will be too high for the next, equally sized partition(s). It gets a bit complex at that point, so I kept it simple. It is definitely possible to improve this.

It uses only O(n) space because I mutate global state instead of copying stuff around (yes, I yield a defensive copy to the enumerator, but that is not necessary if you process the results immediately in every iteration). Maybe not the most elegant solution, but the goal is to achieve good speed.

Output:

105

It is no fun running this with larger n (beyond 20). I blame Ruby for that, but fortunately this particular implementation will look very similar in almost any other language.

UPDATE: Just for fun, I improved the algorithm to be completely linear in the output size (apart from the potential n factor due to maintaining the set of leftover numbers). It should be several times faster that way:

def dfs(i, j, lo, eq, sz, state)
  parts, y, n, k, left = state
  if i == k
    y << parts
    return
  end
  if j == sz
    mid = i+1 == n % k
    neweq, newsz = mid ? [k - i - 1, sz-1] : [eq - 1, sz]
    dfs(i+1, 0, mid ? 1 : parts[i][0], neweq, newsz, state)
    return
  end
  higher = ((j == 0) ? (eq * sz - j - 1) : (sz - j - 1))
  left.drop(higher).each_with_index do |x,idx|
    break if x < lo
    parts[i][j] = x
    left.delete_at(idx + higher)
    dfs(i, j + 1, x + 1, eq, sz, state)
    left.insert(idx + higher, x)
  end
end

def niklas(n, k)
  parts = k.times.map{|i| [0]*(i < n%k ? (n+k-1)/k : n/k) }
  left = n.downto(1).to_a
  results = Enumerator.new { |y|
    dfs(0, 0, 1, (n % k == 0) ? k : n % k, (n + k - 1) / k, [parts, y, n, k, left])
  }
  cnt = results.count
  puts "count = #{cnt} != #{@sz}" unless cnt == @sz
end
like image 34
Niklas B. Avatar answered Sep 24 '22 02:09

Niklas B.