I would like to implement the following algorithm. For n
and k
, consider all combinations with repetitions in sorted order where we choose k
numbers from {0,..n-1}
with repetitions. For example, if n=5
and k =3
we have:
[(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 0, 3), (0, 0, 4), (0, 1, 1), (0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 2, 2), (0, 2, 3), (0, 2, 4), (0, 3, 3), (0, 3, 4), (0, 4, 4), (1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 1, 4), (1, 2, 2), (1, 2, 3), (1, 2, 4), (1, 3, 3), (1, 3, 4), (1, 4, 4), (2, 2, 2), (2, 2, 3), (2, 2, 4), (2, 3, 3), (2, 3, 4), (2, 4, 4), (3, 3, 3), (3, 3, 4), (3, 4, 4), (4, 4, 4)]
I will treat each combination as a multiset from now on. I want to greedily go through these multisets and partition the list. A partition has the property the size of the intersection of all the multisets within it must be at least k-1
. So in this case we have:
(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 0, 3), (0, 0, 4)
then
(0, 1, 1), (0, 1, 2), (0, 1, 3), (0, 1, 4)
then
(0, 2, 2), (0, 2, 3), (0, 2, 4)
then
(0, 3, 3), (0, 3, 4)
then
(0, 4, 4)
and so on.
In python you can iterate over the combinations as follows:
import itertools
for multiset in itertools.combinations_with_replacement(range(5),3):
#Greedy algo
How can I create these partitions?
One problem I have is how to compute the size of the intersection of multisets. The intersection of multisets (2,1,2)
and (3,2,2)
has size 2, for example.
Here is the full answer for n=4, k=4
.
(0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 0, 2), (0, 0, 0, 3)
(0, 0, 1, 1), (0, 0, 1, 2), (0, 0, 1, 3)
(0, 0, 2, 2), (0, 0, 2, 3)
(0, 0, 3, 3)
(0, 1, 1, 1), (0, 1, 1, 2), (0, 1, 1, 3)
(0, 1, 2, 2), (0, 1, 2, 3)
(0, 1, 3, 3)
(0, 2, 2, 2), (0, 2, 2, 3)
(0, 2, 3, 3), (0, 3, 3, 3)
(1, 1, 1, 1), (1, 1, 1, 2), (1, 1, 1, 3)
(1, 1, 2, 2), (1, 1, 2, 3)
(1, 1, 3, 3)
(1, 2, 2, 2), (1, 2, 2, 3)
(1, 2, 3, 3), (1, 3, 3, 3)
(2, 2, 2, 2), (2, 2, 2, 3)
(2, 2, 3, 3), (2, 3, 3, 3)
(3, 3, 3, 3)
One way to create the partitions is to iterate over your iterator and then compare each multiset* to the previous one. I tested 4 ways** to compare the multisets and the fastest I found was to test membership in
an iterator of the previous multiset that is consumed and short-circuits once the membership test fails. If the number of equal items in the multiset and the previous multiset equals the length of the multiset minus 1 then the criteria to group them is met. Then a resulting output generator of list
s is built up where you append
items that meet the criteria to the previous list
and start a new list
containing the tuple
otherwise, yield
ing the groups one at a time to minimize memory usage:
import itertools
def f(n,k):
prev, group = None, []
for multiset in itertools.combinations_with_replacement(range(n),k):
if prev:
it = iter(prev)
for idx, item in enumerate(multiset):
if item not in it:
break
if idx == len(multiset) - 1:
group.append(multiset)
continue
if group:
yield group
group = [multiset]
prev = multiset
yield group
Test cases
Input:
for item in f(4,4):
print(item)
Output:
[(0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 0, 2), (0, 0, 0, 3)]
[(0, 0, 1, 1), (0, 0, 1, 2), (0, 0, 1, 3)]
[(0, 0, 2, 2), (0, 0, 2, 3)]
[(0, 0, 3, 3)]
[(0, 1, 1, 1), (0, 1, 1, 2), (0, 1, 1, 3)]
[(0, 1, 2, 2), (0, 1, 2, 3)]
[(0, 1, 3, 3)]
[(0, 2, 2, 2), (0, 2, 2, 3)]
[(0, 2, 3, 3), (0, 3, 3, 3)]
[(1, 1, 1, 1), (1, 1, 1, 2), (1, 1, 1, 3)]
[(1, 1, 2, 2), (1, 1, 2, 3)]
[(1, 1, 3, 3)]
[(1, 2, 2, 2), (1, 2, 2, 3)]
[(1, 2, 3, 3), (1, 3, 3, 3)]
[(2, 2, 2, 2), (2, 2, 2, 3)]
[(2, 2, 3, 3), (2, 3, 3, 3)]
[(3, 3, 3, 3)]
Input:
for item in f(5,3):
print(item)
Output:
[(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 0, 3), (0, 0, 4)]
[(0, 1, 1), (0, 1, 2), (0, 1, 3), (0, 1, 4)]
[(0, 2, 2), (0, 2, 3), (0, 2, 4)]
[(0, 3, 3), (0, 3, 4)]
[(0, 4, 4)]
[(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 1, 4)]
[(1, 2, 2), (1, 2, 3), (1, 2, 4)]
[(1, 3, 3), (1, 3, 4)]
[(1, 4, 4)]
[(2, 2, 2), (2, 2, 3), (2, 2, 4)]
[(2, 3, 3), (2, 3, 4)]
[(2, 4, 4)]
[(3, 3, 3), (3, 3, 4)]
[(3, 4, 4), (4, 4, 4)]
* I'm calling them multisets to match your terminology but their actually tuple
s (ordered and immutable data structures); using a collections.Counter
object, for example Counter((0, 0, 0, 1))
return
s Counter({0: 3, 1: 1})
, and decrementing would be like a true multiset approach but I found this to be slower because using the order is actually useful.
** Other slower functions that give the same output that I tested:
def f2(n,k):
prev, group = None, []
for multiset in itertools.combinations_with_replacement(range(n),k):
if prev:
if sum(item1 == item2 for item1, item2 in zip(prev,multiset)) == len(multiset) - 1:
group.append(multiset)
continue
if group:
yield group
group = [multiset]
prev = multiset
yield group
def f3(n,k):
prev, group = None, []
for multiset in itertools.combinations_with_replacement(range(n),k):
if prev:
lst = list(prev)
for item in multiset:
if item in lst:
lst.remove(item)
else:
break
if len(multiset) - len(lst) == len(multiset) - 1:
group.append(multiset)
continue
if group:
yield group
group = [multiset]
prev = multiset
yield group
import collections
def f4(n,k):
prev, group = None, []
for multiset in itertools.combinations_with_replacement(range(n),k):
if prev:
if sum((collections.Counter(prev) - collections.Counter(multiset)).values()) == 1:
group.append(multiset)
continue
if group:
yield group
group = [multiset]
prev = multiset
yield group
Example timings:
from timeit import timeit
list(f(11,10)) == list(f2(11,10)) == list(f3(11,10)) == list(f4(11,10))
# True
timeit(lambda: list(f(11,10)), number = 10)
# 4.19157001003623
timeit(lambda: list(f2(11,10)), number = 10)
# 7.32002648897469
timeit(lambda: list(f3(11,10)), number = 10)
# 6.236868146806955
timeit(lambda: list(f4(11,10)), number = 10)
# 47.20136355608702
Note all approaches becomes slow for large values of n
and k
because of the large number of combinations generated.
We can view the tuples in the set/list you want to partition as numbers of length k with base n. Viewed as numbers, your algorithm is greedy on a smallest number first basis. Let the set of all numbers with k "digits" and base n be denoted N(k,n). Ignoring the fact that N(k,n) isn't exactly the list you want to partition for now, we can partition N(k,n) by the partition criteria, greedily on a smallest first basis pretty trivially; by counting from 0 (i.e. 00000 in the case of k=5 for example), and creating a new partition every time there is a carry as we count (i.e. overflow from digit i to digit i+1). I.e. the rule is: carry <=> new_partition.
Proof: Suppose A is the value after the the carry, and the carry was on into i-th digit. A shares a common prefix with all numbers in the previous partition before the carry upto but excluding the i-th, and is therefore at least 1 different. A only shares a suffix after i with one other previous (smaller) number, but that number is already in a partition with other numbers different by more than 1 from A, so A starts a new partition.
However, according to your specification, we are only considering a subset of N(k,n)
; X
, where for any x in X, x[i] <= x[j] when i > j. This adds a slight complication to the above carry <=> new partition rule. Now:
There is only one condition where carry does not imply new_partition: There has just been a carry creating a new partition, then there is another carry, caused by the x[i] <= x[j] when i > j rule. This next carry does not cause a change by more than one so doesn't imply a new partition.
Implementation:
class ExpNum:
''' Represents a number with base @base, @size digits, and funny successor semantics. '''
def __init__(self, base, size):
if size <= 0 or base <= 1:
raise Exception("Bad args")
self.size = size
self.base = base
self.number = [0]*size
self.zero = [0]*size
def increment(self):
''' Increment number by one. If we carry return index of carry else return -1. '''
carried = -1
for i in reversed(range(0, len(self.number))):
self.number[i] = (self.number[i]+1)%self.base
if self.number[i] != 0:
break
carried = i
if carried >= 0:
self.pullup()
return carried
def pullup(self):
''' Ensure x[i] <= x[j] when i > j '''
for i in range(0, len(self.number)):
if self.number[i] == 0 and i > 0:
self.number[i] = self.number[i-1]
def out_by_one_partition(self):
''' Do the partition by counting from 0 to n**k '''
self.number = [0]*self.size
just_carried = False
partition = [list(self.number)]
carried = self.increment()
while self.number != self.zero:
# Check for exception to carry => new partition.
if carried >= 0 and not (just_carried and list(self.number)[carried] == (self.base -1) and len(partition) == 1):
yield(partition)
partition = []
partition += [list(self.number)]
just_carried = carried >= 0
carried = self.increment()
yield(partition)
Test:
from ExpNum import ExpNum
from timeit import timeit
from pprint import pprint
pprint(list(ExpNum(4,4).out_by_one_partition()))
print(timeit(lambda: list(ExpNum(11,10).out_by_one_partition()), number = 10))
Test Result:
[[[0, 0, 0, 0], [0, 0, 0, 1], [0, 0, 0, 2], [0, 0, 0, 3]],
[[0, 0, 1, 1], [0, 0, 1, 2], [0, 0, 1, 3]],
[[0, 0, 2, 2], [0, 0, 2, 3]],
[[0, 0, 3, 3]],
[[0, 1, 1, 1], [0, 1, 1, 2], [0, 1, 1, 3]],
[[0, 1, 2, 2], [0, 1, 2, 3]],
[[0, 1, 3, 3]],
[[0, 2, 2, 2], [0, 2, 2, 3]],
[[0, 2, 3, 3], [0, 3, 3, 3]],
[[1, 1, 1, 1], [1, 1, 1, 2], [1, 1, 1, 3]],
[[1, 1, 2, 2], [1, 1, 2, 3]],
[[1, 1, 3, 3]],
[[1, 2, 2, 2], [1, 2, 2, 3]],
[[1, 2, 3, 3], [1, 3, 3, 3]],
[[2, 2, 2, 2], [2, 2, 2, 3]],
[[2, 2, 3, 3], [2, 3, 3, 3]],
[[3, 3, 3, 3]]]
10.25355386902811
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With