Given a set with n elements, I need to find all the partitions of that set where there are k subsets of almost equal size.
For example, for a set with 7 elements and 3 subsets I only want partitions where there are two subsets with 2 elements each and one subset with 3 elements. I do not want a partition that has subsets with 1, 2, and 4 elements.
Put another way, there are 877 possible partitions for a set of 7 elements, but I'm only interested in the 105 (?) partitions that are made up subsets with 2/2/3 elements:
In reality n is around 35, which means that there are approximately 2.81 * 1027 partitions, "only" 8,338,573,669,964,101 partitions with three subsets. As such, I can't possibly calculate them all and wade through to find the ones that I want.
What's an algorithm for calculating just the partitions of interest? (Not the number of partitions, but that actual members in each sub-set for each partition.)
If S(n, k) is the number of onto functions from an n-element set onto a k-element set, then S(n, k)/k! gives the number of partitions of an n-element set into k nonempty subsets. Hence, by the sum rule, dn = S(n,1)/1! + S(n,2)/2!
The 52 partitions of a set with 5 elements. A colored region indicates a subset of X that forms a member of the enclosing partition. Uncolored dots indicate single-element subsets.
n! k!( n−k)! . In other words, the number of subsets of size k of an n-set is (n k ) . Page 12 Property 3 says that binomial coefficients can be calculated recursively, using Pas- cal's triangle, where each entry is the sum of the two adjacent ones in the up- per row.
Here's a good way to do it, generating all possibilities only once by taking care of keeping them sorted, as well as a quick way to calculate lazily the number of answers:
def enum(n, k)
# Pick smaller_size items from the list, repeat smaller_n times
# then pick larger_size items from the list, repeat larger_n times.
smaller_n = n.div k
larger_times = n % k
smaller_times = k - larger_times
larger_n = smaller_n + 1
return to_enum(:enum, n, k) { calc_size(n, smaller_n, smaller_times, larger_n, larger_times) } unless block_given?
all = [*1..n]
# split all into one subset to group with the smaller_n and another
# to group with the larger_n.
all.combination(smaller_n * smaller_times).each do |smaller|
larger = all - smaller
subdivide(smaller, smaller_n) do |small|
subdivide(larger, larger_n) do |large|
yield [*small, *large]
end
end
end
end
# Subdivides elems into groups of n, keeping the elements sorted
# and generating only the sorted such combinations.
def subdivide(elems, n)
return yield [] if elems.empty?
# No choice for the first element, because we want to keep things sorted.
first, *rest = elems
rest.combination(n - 1).each do |comb|
remain = rest - comb
subdivide(remain, n) do |sub|
yield [[first, *comb], *sub]
end
end
end
def calc_size(n, smaller_n, smaller_times, larger_n, larger_times)
all = [
smaller_times.times.map do |i|
Array.new(n - i*smaller_n).combination(smaller_n)
end,
larger_times.times.map do |i|
Array.new(n - smaller_times*smaller_n - i*larger_n).combination(larger_n)
end
]
# Multiply everything, but divide by the number of symmetries, because
# we don't want to distinguish (1,2), (3,4), ... from (3,4), (1,2), ...
all.map do |enums|
enums.map(&:size).inject(1, :*) / enums.permutation.size
end.inject(:*)
end
p enum(7, 3).size # => 105 (instant)
p enum(7, 3).first(5) # => [[[1, 2], [3, 4], [5, 6, 7]],
# [[1, 3], [2, 4], [5, 6, 7]],
# [[1, 4], [2, 3], [5, 6, 7]],
# [[1, 2], [3, 5], [4, 6, 7]],
# [[1, 3], [2, 5], [4, 6, 7]]]
p enum(7, 3).count # => 105 (quick)
p enum(35, 3).size # => 564121960420200 (instant)
p enum(35, 3).first(2) # => [[[1..11], [12..23], [24..35]],
# [[1..11], [12..22, 24], [23, 25..35]]]
p enum(35, 3).count # => will take forever, should return 564121960420200
Note: Just for kicks, this can also calculate the size lazily by building some enumerators and using size
, without iterating through them. This will work in Ruby 2.0+ only though, as it require Enumerator#size
.
For added fun:
require 'with_progress'
enum(16, 3).with_progress.count # => enjoy!
Important observation: Under the condition that the partition sizes must not differ by more than 1, the multiset of partition sizes is uniquely defined. We have (n % k)
parts of size ceil(n / k)
and (k - n % k)
partitions of size floor(n / k)
.
So let's fix the partition sizes and just enumerate all possible subsets for all partitions:
@n = n = 7
@k = k = 3
@used = n.times.map { false }
# @sizes = [2,2,3] for n = 7, k = 3
@sizes = (k - n % k).times.map { n / k } + (n % k).times.map { (n + k - 1) / k }
@parts = @sizes.map { |i| [0]*i }
def enumerate_part(i, j, lo, y)
# i = index of the current part
# j = index of the current number in the current part
# lo = lower bound for numbers to chose
# y = output yielder
if i == @parts.size
y << @parts.map(&:dup)
return
end
if j == @sizes[i]
enumerate_part(i + 1, 0, @sizes[i + 1] == @sizes[i] ? @parts[i][0] : 1, y)
return
end
(lo..@n).each do |x|
next if @used[x]
@used[x] = true
@parts[i][j] = x
enumerate_part(i, j + 1, x + 1, y)
@used[x] = false
end
end
results = Enumerator.new { |y| enumerate_part(0,0,1,y) }
puts results.count
Note that for the partitions of equal size, I assign them increasing minima to achieve uniqueness. This can cause a bit of backtracking (at most 2 levels), so it's not optimal. We might want to avoid assigning numbers where we already know that the lower bound will be too high for the next, equally sized partition(s). It gets a bit complex at that point, so I kept it simple. It is definitely possible to improve this.
It uses only O(n) space because I mutate global state instead of copying stuff around (yes, I yield a defensive copy to the enumerator, but that is not necessary if you process the results immediately in every iteration). Maybe not the most elegant solution, but the goal is to achieve good speed.
Output:
105
It is no fun running this with larger n (beyond 20). I blame Ruby for that, but fortunately this particular implementation will look very similar in almost any other language.
UPDATE: Just for fun, I improved the algorithm to be completely linear in the output size (apart from the potential n factor due to maintaining the set of leftover numbers). It should be several times faster that way:
def dfs(i, j, lo, eq, sz, state)
parts, y, n, k, left = state
if i == k
y << parts
return
end
if j == sz
mid = i+1 == n % k
neweq, newsz = mid ? [k - i - 1, sz-1] : [eq - 1, sz]
dfs(i+1, 0, mid ? 1 : parts[i][0], neweq, newsz, state)
return
end
higher = ((j == 0) ? (eq * sz - j - 1) : (sz - j - 1))
left.drop(higher).each_with_index do |x,idx|
break if x < lo
parts[i][j] = x
left.delete_at(idx + higher)
dfs(i, j + 1, x + 1, eq, sz, state)
left.insert(idx + higher, x)
end
end
def niklas(n, k)
parts = k.times.map{|i| [0]*(i < n%k ? (n+k-1)/k : n/k) }
left = n.downto(1).to_a
results = Enumerator.new { |y|
dfs(0, 0, 1, (n % k == 0) ? k : n % k, (n + k - 1) / k, [parts, y, n, k, left])
}
cnt = results.count
puts "count = #{cnt} != #{@sz}" unless cnt == @sz
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With