Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to generate combinations ordered by increasing sum of indexes

For a heuristic algorithm I need to evaluate, one after the other, the combinations of a certain set until I reach a stop criterion.

Since they are a lot, at the moment I'm generating them using the following memory efficient iterator block (inspired by python's itertools.combinations):

public static IEnumerable<T[]> GetCombinations<T>(this IList<T> pool, int r)
{
    int n = pool.Count;
    if (r > n)
        throw new ArgumentException("r cannot be greater than pool size");
    int[] indices = Enumerable.Range(0, r).ToArray();
    yield return indices.Select(idx => pool[idx]).ToArray();
    while (true)
    {
        int i;
        for (i = r - 1; i >= 0; i--)
            if (indices[i] != i + n - r)
                break;
        if (i < 0)
            break;
        indices[i] += 1;
        for (int j = i + 1; j < r; j++)
            indices[j] = indices[j - 1] + 1;
        yield return indices.Select(idx => pool[idx]).ToArray();
    }
}

The problem is, to greatly improve the efficiency of my heuristic, I'd need to generate these combinations sorted by the sum of they indexes (in other words I need to generate first, the combinations containing the first elements of the set).

e.g.
Consider the set S = {0,1,2,3,4,5}
(I choose this set for simplicity since elements and their indexes coincide).
All possible combinations of r=4 numbers generated from the given algorithm are:

(0, 1, 2, 3)  SUM:  6
(0, 1, 2, 4)  SUM:  7
(0, 1, 2, 5)  SUM:  8
(0, 1, 3, 4)  SUM:  8
(0, 1, 3, 5)  SUM:  9
(0, 1, 4, 5)  SUM: 10
(0, 2, 3, 4)  SUM:  9
(0, 2, 3, 5)  SUM: 10
(0, 2, 4, 5)  SUM: 11
(0, 3, 4, 5)  SUM: 12
(1, 2, 3, 4)  SUM: 10
(1, 2, 3, 5)  SUM: 11
(1, 2, 4, 5)  SUM: 12
(1, 3, 4, 5)  SUM: 13
(2, 3, 4, 5)  SUM: 14

where, as you can see, the combinations are not strictly sorted by ascending sum.

The desired outcome is instead the following :
(the order of the combinations having the same sum is not important)

(0, 1, 2, 3)  SUM:  6
(0, 1, 2, 4)  SUM:  7
(0, 1, 2, 5)  SUM:  8
(0, 1, 3, 4)  SUM:  8
(0, 1, 3, 5)  SUM:  9
(0, 2, 3, 4)  SUM:  9
(0, 1, 4, 5)  SUM: 10
(0, 2, 3, 5)  SUM: 10
(1, 2, 3, 4)  SUM: 10
(0, 2, 4, 5)  SUM: 11
(1, 2, 3, 5)  SUM: 11
(0, 3, 4, 5)  SUM: 12
(1, 2, 4, 5)  SUM: 12
(1, 3, 4, 5)  SUM: 13
(2, 3, 4, 5)  SUM: 14

A trivial solution would be to generate all the combinations then sort them according to their sum; but this is not really efficient/feasible since the number of combinations becomes huge as n grows.

I also had a quick look to combinatorial Gray Codes but I couldn't find anyone suitable for this problem.

Do you have an idea on how to implement something like this ?

EDIT :

This problem has an alternate (unfortunately not easier) formulation.
Given a set S and a number r, all the possible sums are trivial to find, since they are simply all the numbers from the sum of the first r elements of S to the sum of the last r elements of S.

That being said, if, for each sum T we can efficiently¹ find all the combinations having sum T we solve the original problem since we simply generate them in ascending order.

¹ efficiently means that I don't want to generate all the combinations and discard the ones having a different sum.

EDIT 2:

After @EricLippert suggestion I created the following code:

public static IEnumerable<T[]> 
GetCombinationsSortedByIndexSum<T>(this IList<T> pool, int r)
{
    int n = pool.Count;
    if (r > n)
        throw new ArgumentException("r cannot be greater than pool size");
    int minSum = ((r - 1) * r) / 2;
    int maxSum = (n * (n + 1)) / 2 - ((n - r - 1) * (n - r)) / 2;

    for (int sum = minSum; sum <= maxSum; sum++)
    {
        foreach (var indexes in AllMonotIncrSubseqOfLenMWhichSumToN(0, n - 1, r, sum))
            yield return indexes.Select(x => pool[x]).ToArray();
    }
}

static IEnumerable<IEnumerable<int>> 
AllMonotIncrSubseqOfLenMWhichSumToN(int seqFirstElement, int seqLastElement, int m, int n)
{
    for (int i = seqFirstElement; i <= seqLastElement - m + 1; i++)
    {
        if (m == 1)
        {
            if (i == n)
                yield return new int[] { i };
        }
        else
        {
            foreach (var el in AllMonotIncrSubseqOfLenMWhichSumToN(i + 1, seqLastElement, m - 1, n - i))
                yield return new int[] { i }.Concat(el);
        }
    }
}

This works fine (hopefully is what Eric meant :P) but I'm still concerned about the complexity of the recursive method. In fact it seems that we're regenerating all the combinations for each sum discarding the ones not summing up to the desired value.

To reduce the complexity of the inner function I found a way to limit the iterations by using effective upper and lower bounds (and now it's really hard to say what is the complexity of this).

Check my answer to see the final code.

like image 962
digEmAll Avatar asked May 24 '13 14:05

digEmAll


2 Answers

The solution I had in mind was:

using System;
using System.Collections.Generic;
using System.Linq;
class Program
{
  // Preconditions:
  // * items is a sequence of non-negative monotone increasing integers
  // * n is the number of items to be in the subsequence
  // * sum is the desired sum of that subsequence.
  // Result:
  // A sequence of subsequences of the original sequence where each 
  // subsequence has n items and the given sum.
  static IEnumerable<IEnumerable<int>> M(IEnumerable<int> items, int sum, int n)
  {
    // Let's start by taking some easy outs. If the sum is negative
    // then there is no solution. If the number of items in the
    // subsequence is negative then there is no solution.

    if (sum < 0 || n < 0)
      yield break;

    // If the number of items in the subsequence is zero then
    // the only possible solution is if the sum is zero.

    if (n == 0)
    {
      if (sum == 0)
        yield return Enumerable.Empty<int>();
      yield break;
    }

    // If the number of items is less than the required number of 
    // items, there is no solution.

    if (items.Count() < n)
      yield break;

    // We have at least n items in the sequence, and
    // and n is greater than zero, so First() is valid:

    int first = items.First();

    // We need n items from a monotone increasing subsequence
    // that have a particular sum. We might already be too 
    // large to meet that requirement:

    if (n * first > sum)
      yield break;

    // There might be some solutions that involve the first element.
    // Find them all.

    foreach(var subsequence in M(items.Skip(1), sum - first, n - 1))
      yield return new[]{first}.Concat(subsequence);      

    // And there might be some solutions that do not involve the first element.
    // Find them all.

    foreach(var subsequence in M(items.Skip(1), sum, n))
      yield return subsequence;
  }
  static void Main()
  {
    int[] x = {0, 1, 2, 3, 4, 5};
    for (int i = 0; i <= 15; ++i)
      foreach(var seq in M(x, i, 4))
        Console.WriteLine("({0}) SUM {1}", string.Join(",", seq), i);
  }
}       

The output is your desired output.

I've made no attempt to optimize this. It would be interesting to profile it and see where most of the time is spent.

UPDATE: Just for fun I wrote a version that uses an immutable stack instead of an arbitrary enumerable. Enjoy!

using System;
using System.Collections.Generic;
using System.Linq;

abstract class ImmutableList<T> : IEnumerable<T>
{
  public static readonly ImmutableList<T> Empty = new EmptyList();
  private ImmutableList() {}  
  public abstract bool IsEmpty { get; }
  public abstract T Head { get; }
  public abstract ImmutableList<T> Tail { get; }
  public ImmutableList<T> Push(T newHead)
  {
    return new List(newHead, this);
  }  

  private sealed class EmptyList : ImmutableList<T>
  {
    public override bool IsEmpty { get { return true; } }
    public override T Head { get { throw new InvalidOperationException(); } }
    public override ImmutableList<T> Tail { get { throw new InvalidOperationException(); } }
  }
  private sealed class List : ImmutableList<T>
  {
    private readonly T head;
    private readonly ImmutableList<T> tail;
    public override bool IsEmpty { get { return false; } }
    public override T Head { get { return head; } }
    public override ImmutableList<T> Tail { get { return tail; } }
    public List(T head, ImmutableList<T> tail)
    {
      this.head = head;
      this.tail = tail;
    }
  }
  System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
  {
    return this.GetEnumerator();
  }
  public IEnumerator<T> GetEnumerator()
  {
    for (ImmutableList<T> current = this; !current.IsEmpty; current = current.Tail)
      yield return current.Head;
  }
}  

class Program
{
  // Preconditions:
  // * items is a sequence of non-negative monotone increasing integers
  // * n is the number of items to be in the subsequence
  // * sum is the desired sum of that subsequence.
  // Result:
  // A sequence of subsequences of the original sequence where each 
  // subsequence has n items and the given sum.
  static IEnumerable<ImmutableList<int>> M(ImmutableList<int> items, int sum, int n)
  {
    // Let's start by taking some easy outs. If the sum is negative
    // then there is no solution. If the number of items in the
    // subsequence is negative then there is no solution.

    if (sum < 0 || n < 0)
      yield break;

    // If the number of items in the subsequence is zero then
    // the only possible solution is if the sum is zero.
    if (n == 0)
    {
      if (sum == 0)
        yield return ImmutableList<int>.Empty;
      yield break;
    }

    // If the number of items is less than the required number of 
    // items, there is no solution.

    if (items.Count() < n)
      yield break;

    // We have at least n items in the sequence, and
    // and n is greater than zero.
    int first = items.Head;

    // We need n items from a monotone increasing subsequence
    // that have a particular sum. We might already be too 
    // large to meet that requirement:

    if (n * first > sum)
      yield break;

    // There might be some solutions that involve the first element.
    // Find them all.

    foreach(var subsequence in M(items.Tail, sum - first, n - 1))
      yield return subsequence.Push(first);      

    // And there might be some solutions that do not involve the first element.
    // Find them all.
    foreach(var subsequence in M(items.Tail, sum, n))
      yield return subsequence;
  }
  static void Main()
  {
    ImmutableList<int> x = ImmutableList<int>.Empty.Push(5).
                           Push(4).Push(3).Push(2).Push(1).Push(0);
    for (int i = 0; i <= 15; ++i)
      foreach(var seq in M(x, i, 4))
        Console.WriteLine("({0}) SUM {1}", string.Join(",", seq), i);
  }
}       
like image 130
Eric Lippert Avatar answered Sep 21 '22 00:09

Eric Lippert


For the sake of completeness and clarity I'll post my final code:

// Given a pool of elements returns all the 
// combinations of the groups of lenght r in pool, 
// such that the combinations are ordered (ascending) by the sum of 
// the indexes of the elements.
// e.g. pool = {A,B,C,D,E} r = 3
// returns
// (A, B, C)   indexes: (0, 1, 2)   sum: 3
// (A, B, D)   indexes: (0, 1, 3)   sum: 4
// (A, B, E)   indexes: (0, 1, 4)   sum: 5
// (A, C, D)   indexes: (0, 2, 3)   sum: 5
// (A, C, E)   indexes: (0, 2, 4)   sum: 6
// (B, C, D)   indexes: (1, 2, 3)   sum: 6
// (A, D, E)   indexes: (0, 3, 4)   sum: 7
// (B, C, E)   indexes: (1, 2, 4)   sum: 7
// (B, D, E)   indexes: (1, 3, 4)   sum: 8
// (C, D, E)   indexes: (2, 3, 4)   sum: 9
public static IEnumerable<T[]>
GetCombinationsSortedByIndexSum<T>(this IList<T> pool, int r)
{
    int n = pool.Count;
    if (r > n)
        throw new ArgumentException("r cannot be greater than pool size");
    int minSum = F(r - 1);
    int maxSum = F(n) - F(n - r - 1);

    for (int sum = minSum; sum <= maxSum; sum++)
    {
        foreach (var indexes in AllSubSequencesWithGivenSum(0, n - 1, r, sum))
            yield return indexes.Select(x => pool[x]).ToArray();
    }
}


// Given a start element and a last element of a sequence of consecutive integers
// returns all the monotonically increasing subsequences of length "m" having sum "sum"
// e.g. seqFirstElement = 1, seqLastElement = 5, m = 3, sum = 8
//      returns {1,2,5} and {1,3,4}
static IEnumerable<IEnumerable<int>>
AllSubSequencesWithGivenSum(int seqFirstElement, int seqLastElement, int m, int sum)
{
    int lb = sum - F(seqLastElement) + F(seqLastElement - m + 1);
    int ub = sum - F(seqFirstElement + m - 1) + F(seqFirstElement);

    lb = Math.Max(seqFirstElement, lb);
    ub = Math.Min(seqLastElement - m + 1, ub);

    for (int i = lb; i <= ub; i++)
    {
        if (m == 1)
        {
            if (i == sum) // this check shouldn't be necessary anymore since LB/UB should automatically exclude wrong solutions
                yield return new int[] { i };
        }
        else
        {
            foreach (var el in AllSubSequencesWithGivenSum(i + 1, seqLastElement, m - 1, sum - i))
                yield return new int[] { i }.Concat(el);
        }
    }
}

// Formula to compute the sum of the numbers from 0 to n
// e.g. F(4) = 0 + 1 + 2 + 3 + 4 = 10
static int F(int n)
{
    return (n * (n + 1)) / 2;
}
like image 28
digEmAll Avatar answered Sep 24 '22 00:09

digEmAll