Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating non-consecutive combinations

I am trying to create a generator (iterator which supports doing a next, perhaps using yield in python) which gives all combinations of r elements from {1,2,...n} (n and r are parameters) such that in the selected r elements, no two are consecutive.

For example, for r = 2 and n= 4

The generated combinations are {1,3}, {1,4}, {2, 4}.

I could generate all combinations(as an iterator) and filter those which don't satisfy the criteria, but we will be doing unnecessary work.

Is there some generation algorithm such that the next is O(1) (and if that is not possible, O(r) or O(n)).

The order in which the sets are returned is not relevant (and hopefully will allow an O(1) algorithm).

Note: I have tagged it python, but a language-agnostic algorithm will help too.

Update:

I have found a way to map it to generating pure combinations! A web search reveals that O(1) is possible for combinations (though it seems complicated).

Here is the mapping.

Suppose we have a combination x_1, x_2, ... , x_r with x_1 + 1 < x_2, x_2 + 1 < x_3, ...

We map to y_1, y_2, ..., y_r as follows

y_1 = x_1

y_2 = x_2 - 1

y_3 = x_3 - 2

...

y_r = x_r - (r-1)

This way we have that y_1 < y_2 < y_3 ... without the non-consecutive constraint!

This basically amounts to choosing r elements out of n-r+1. Thus all I need to do is run the generation for (n-r+1 choose r).

For our purposes, using the mapping after things are generated is good enough.

Reasons for choosing svkcr's answer

All great answers, but I have chosen svkcr's answer.

Here are some reasons why

  1. It is effectively stateless (or "Markovian" to be more precise). The next permutation can be generated from the previous one. It is in a way almost optimal: O(r) space and time.

  2. It is predictable. We know exactly the order(lexicographic) in which the combinations are generated.

These two properties make it easy to parallelise the generation (split at predictable points and delegate), with fault tolerance thrown in (can pick off from the last generated combination if a CPU/machine fails)!

Sorry, parallelisation was not mentioned earlier, because it didn't occur to me when I wrote the question and I got that idea only later.

like image 630
Knoothe Avatar asked Mar 14 '13 23:03

Knoothe


2 Answers

This is fun! How about this:

def nonconsecutive_combinations(r, n):
  # first combination, startng at 1, step-size 2
  combination = range(1, r*2, 2)
  # as long as all items are less than or equal to n
  while combination[r-1] <= n:
    yield tuple(combination)
    p = r-1 # pointer to the element we will increment
    a = 1   # number that will be added to the last element
    # find the rightmost element we can increment without
    # making the last element bigger than n
    while p > 0 and combination[p] + a > n:
      p -= 1
      a += 2
    # increment the item and
    # fill the tail of the list with increments of two
    combination[p:] = range(combination[p]+1, combination[p] + 2*(r-p), 2)

Each next() call should have an O(r) .. I got the idea while thinking about how this would translate to natural numbers, but it took quite some time to get the details right.

> list(nonconsecutive_combinations(2, 4))
[(1, 3), (1, 4), (2, 4)]
> list(nonconsecutive_combinations(3, 6))
[(1, 3, 5), (1, 3, 6), (1, 4, 6), (2, 4, 6)]

Let me try to explain how this works.

Conditions for a tuple c with r elements to be part of the result set:

  1. Any element of the tuple is at least as large as the preceding element plus 2. (c[x] >= c[x-1] + 2)
  2. All elements are less than, or equal to n. Because of 1. it is sufficient to say that the last element is less than or equal to n. (c[r-1] <= n)

The smallest tuple that may be part of the result set is (1, 3, 5, ..., 2*r-1). When I say a tuple is "smaller" than another, I am assuming the lexicographical order.

As Blckknght is pointing out, even the smallest possible tuple may be to large to satisfy condition 2.

The function above contains two while loops:

  • The outer loop steps through the results and assumes they appear in lexicographical order and satisfy condition one. As soon as the tuple in question violates condition two, we know that we have exhausted the result set and are done:

    combination = range(1, r*2, 2)
    while combination[r-1] <= n:
    

    The first line initializes the result-tuple with the first possible result according to condition one. Line two squarely translates to condition two.

  • The inner loop finds the next possible tuple satisfying condition one.

    yield tuple(combination)
    

    Since the while condition (2) is true and we made sure the result satisfies condition one we can yield the current result-tuple.

    Next, to find the lexicographically next tuple, we would add "1" to the last element.

    # we don't actually do this:
    combination[r-1] += 1
    

    However, that may break condition 2 too early. So, if that operation would break condition 2, we increment preceding element and adjust the last element accordingly. This is a little like counting integers base 10: "If the last digit is larger than 9, increment the previous digit and make the last digit a 0." But instead of adding zeros, we fill the tuple so that condition 1 is true.

    # if above does not work
    combination[r-2] += 1
    combination[r-1]  = combination[r-2] + 2
    

    Problem is, the second line may break condition two yet again. So what we actually do is, we keep track of the last element and that is what is done with the a. Also we use the p variable to refer to the index current element we are looking at.

    p = r-1
    a = 1
    while p > 0 and combination[p] + a > n:
      p -= 1
      a += 2
    

    We are iterating right-to-left (p = r-1, p -= 1) through the items of the result tuple. Initially we want to add one to the last item (a = 1) but when stepping through the tuple we actually want to replace the last item with the value of a preceding item plus 2*x, where x is the distance between the two items. (a += 2, combination[p] + a)

    Finally, we have found the item we want to increment, and fill the rest of the tuple with a sequence starting at the incremented item, with a step size of 2:

    combination[p:] = range(combination[p]+1, combination[p] + 2*(r-p), 2)
    

    And that's it. It seemed so easy when I first thought of it, but all the arithmetic throughout the function make a great place for off-by-one errors and describing it is harder than it should be. I should have known I'm in trouble when I added that inner loop :)

On performance ..

Unfortunately while loops filled with arithmetic are not the most efficient thing to write in Python. The other solutions accept that reality and use list comprehensions or filtering to push the heavy lifting down into the Python runtime. This seems to me to be the right thing to do.

On the other hand, I'm quite certain that my solution would perform a lot better than most if this were C. The inner while loop is O(log r) and it mutates the result in-place and the same O(log r). It does not consume additional stack frames and does not consume any memory besides the result and two variables. But obviously this is not C, so none of this really matters.

like image 185
svckr Avatar answered Nov 08 '22 18:11

svckr


here is my recursive generator(it just skips n+1th item if nth item is selected):

def non_consecutive_combinator(rnge, r, prev=[]):
    if r == 0:
        yield prev

    else:
        for i, item in enumerate(rnge):
            for next_comb in non_consecutive_combinator(rnge[i+2:], r-1, prev+[item]):
                yield next_comb

print list(non_consecutive_combinator([1,2,3,4], 2))
#[[1, 3], [1, 4], [2, 4]]
print list(non_consecutive_combinator([1,2,3,4,5], 2))
#[[1, 3], [1, 4], [1, 5], [2, 4], [2, 5], [3, 5]]
print list(non_consecutive_combinator(range(1, 10), 3))
#[[1, 3, 5], [1, 3, 6], [1, 3, 7], [1, 3, 8], [1, 3, 9], [1, 4, 6], [1, 4, 7], [1, 4, 8], [1, 4, 9], [1, 5, 7], [1, 5, 8], [1, 5, 9], [1, 6, 8], [1, 6, 9], [1, 7, 9], [2, 4, 6], [2, 4, 7], [2, 4, 8], [2, 4, 9], [2, 5, 7], [2, 5, 8], [2, 5, 9], [2, 6, 8], [2, 6, 9], [2, 7, 9], [3, 5, 7], [3, 5, 8], [3, 5, 9], [3, 6, 8], [3, 6, 9], [3, 7, 9], [4, 6, 8], [4, 6, 9], [4, 7, 9], [5, 7, 9]]

on efficiency :

this code can't be O(1) because of traversing stack and building a new collection upon each iteration won't be O(1). also, a recursive generator means you have to use r maximum stack depth to get r-item combination. this means with low r, the cost of calling stacks can be more expensive then non-recursive generation. with enough n and r, it is potentially much more efficient than itertools-based solution.

i've tested two uploaded codes in this question:

from itertools import ifilter, combinations
import timeit

def filtered_combi(n, r):
    def good(combo):
        return not any(combo[i]+1 == combo[i+1] for i in range(len(combo)-1))
    return ifilter(good, combinations(range(1, n+1), r))

def non_consecutive_combinator(rnge, r, prev=[]):
    if r == 0:
        yield prev

    else:
        for i, item in enumerate(rnge):
            for next_comb in non_consecutive_combinator(rnge[i+2:], r-1, prev+[item]):
                yield next_comb

def wrapper(n, r):
    return non_consecutive_combinator(range(1, n+1), r)   

def consume(f, *args, **kwargs):
    deque(f(*args, **kwargs))

t = timeit.timeit(lambda : consume(wrapper, 30, 4), number=100)
f = timeit.timeit(lambda : consume(filtered_combi, 30, 4), number=100)

results and more results(edit) (on windows7, python 64bit 2.7.3, core i5 ivy bridge with 8gb ram) :

(n, r)  recursive   itertools
----------------------------------------
(30, 4) 1.6728046   4.0149797   100 times   17550 combinations
(20, 4) 2.6734657   7.0835997   1000 times  2380 combinations
(10, 4) 0.1253318   0.3157737   1000 times  35 combinations
(4, 2)  0.0091073   0.0120918   1000 times  3 combinations
(20, 5) 0.6275073   2.4236898   100 times   4368 combinations
(20, 6) 1.0542227   6.1903468   100 times   5005 combinations
(20, 7) 1.3339530   12.4065561  100 times   3432 combinations
(20, 8) 1.4118724   19.9793801  100 times   1287 combinations
(20, 9) 1.4116702   26.1977839  100 times   220 combinations

as you can see, the gap between recursive solution and itertools.combination based solution gets wider as n goes up.

in fact, with the gap between both solutions is heavily dependent on r - bigger r means you have to throw away more combinations generated from itertools.combinations. for example, in case of n=20, r=9 : we filter and take only 220 combinations among 167960(20C9) combinations. if n and r are small, using itertools.combinations can be faster because it is more efficient at less r and does not use stack as I explained. (as you can see, itertools are very optimized(if write your logic with for, if, while and bunch of generator and list comprehensions, it won't be as fast as the abstracted one with itertools), and this is an one of reasons why people love python - you bring your code to higher level, and you get rewarded. not many languages to that.

like image 22
thkang Avatar answered Nov 08 '22 17:11

thkang