Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating all Possible Combinations

Sure thing. It is a bit tricky to do this with LINQ but certainly possible using only the standard query operators.

UPDATE: This is the subject of my blog on Monday June 28th 2010; thanks for the great question. Also, a commenter on my blog noted that there is an even more elegant query than the one I gave. I'll update the code here to use it.

The tricky part is to make the Cartesian product of arbitrarily many sequences. "Zipping" in the letters is trivial compared to that. You should study this to make sure that you understand how it works. Each part is simple enough but the way they are combined together takes some getting used to:

static IEnumerable<IEnumerable<T>> CartesianProduct<T>(this IEnumerable<IEnumerable<T>> sequences)
{
    IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>()};
    return sequences.Aggregate(
        emptyProduct,
        (accumulator, sequence) => 
            from accseq in accumulator 
            from item in sequence 
            select accseq.Concat(new[] {item})                          
        );
 }

To explain how this works, first understand what the "accumulate" operation is doing. The simplest accumulate operation is "add everything in this sequence together". The way you do that is: start with zero. For each item in the sequence, the current value of the accumulator is equal to the sum of the item and previous value of the accumulator. We're doing the same thing, except that instead of accumulating the sum based on the sum so far and the current item, we're accumulating the Cartesian product as we go.

The way we're going to do that is to take advantage of the fact that we already have an operator in LINQ that computes the Cartesian product of two things:

from x in xs
from y in ys
do something with each possible (x, y)

By repeatedly taking the Cartesian product of the accumulator with the next item in the input sequence and doing a little pasting together of the results, we can generate the Cartesian product as we go.

So think about the value of the accumulator. For illustrative purposes I'm going to show the value of the accumulator as the results of the sequence operators it contains. That is not what the accumulator actually contains. What the accumulator actually contains is the operators that produce these results. The whole operation here just builds up a massive tree of sequence operators, the result of which is the Cartesian product. But the final Cartesian product itself is not actually computed until the query is executed. For illustrative purposes I'll show what the results are at each stage of the way but remember, this actually contains the operators that produce those results.

Suppose we are taking the Cartesian product of the sequence of sequences {{1, 2}, {3, 4}, {5, 6}}. The accumulator starts off as a sequence containing one empty sequence: { { } }

On the first accumulation, accumulator is { { } } and item is {1, 2}. We do this:

from accseq in accumulator
from item in sequence 
select accseq.Concat(new[] {item})

So we are taking the Cartesian product of { { } } with {1, 2}, and for each pair, we concatenate: We have the pair ({ }, 1), so we concatenate { } and {1} to get {1}. We have the pair ({ }, 2}), so we concatenate { } and {2} to get {2}. Therefore we have {{1}, {2}} as the result.

So on the second accumulation, accumulator is {{1}, {2}} and item is {3, 4}. Again, we compute the Cartesian product of these two sequences to get:

 {({1}, 3), ({1}, 4), ({2}, 3), ({2}, 4)}

and then from those items, concatenate the second one onto the first. So the result is the sequence {{1, 3}, {1, 4}, {2, 3}, {2, 4}}, which is what we want.

Now we accumulate again. We take the Cartesian product of the accumulator with {5, 6} to get

 {({ 1, 3}, 5), ({1, 3}, 6), ({1, 4}, 5), ...

and then concatenate the second item onto the first to get:

{{1, 3, 5}, {1, 3, 6}, {1, 4, 5}, {1, 4, 6} ... }

and we're done. We've accumulated the Cartesian product.

Now that we have a utility function that can take the Cartesian product of arbitrarily many sequences, the rest is easy by comparison:

var arr1 = new[] {"a", "b", "c"};
var arr2 = new[] { 3, 2, 4 };
var result = from cpLine in CartesianProduct(
                 from count in arr2 select Enumerable.Range(1, count)) 
             select cpLine.Zip(arr1, (x1, x2) => x2 + x1);

And now we have a sequence of sequences of strings, one sequence of strings per line:

foreach (var line in result)
{
    foreach (var s in line)
        Console.Write(s);
    Console.WriteLine();
}

Easy peasy!


using System;
using System.Text;

public static string[] GenerateCombinations(string[] Array1, int[] Array2)
{
    if(Array1 == null) throw new ArgumentNullException("Array1");
    if(Array2 == null) throw new ArgumentNullException("Array2");
    if(Array1.Length != Array2.Length)
        throw new ArgumentException("Must be the same size as Array1.", "Array2");

    if(Array1.Length == 0)
        return new string[0];

    int outputSize = 1;
    var current = new int[Array1.Length];
    for(int i = 0; i < current.Length; ++i)
    {
        if(Array2[i] < 1)
            throw new ArgumentException("Contains invalid values.", "Array2");
        if(Array1[i] == null)
            throw new ArgumentException("Contains null values.", "Array1");
        outputSize *= Array2[i];
        current[i] = 1;
    }

    var result = new string[outputSize];
    for(int i = 0; i < outputSize; ++i)
    {
        var sb = new StringBuilder();
        for(int j = 0; j < current.Length; ++j)
        {
            sb.Append(Array1[j]);
            sb.Append(current[j].ToString());
            if(j != current.Length - 1)
                sb.Append(' ');
        }
        result[i] = sb.ToString();
        int incrementIndex = current.Length - 1;
        while(incrementIndex >= 0 && current[incrementIndex] == Array2[incrementIndex])
        {
                current[incrementIndex] = 1;
                --incrementIndex;
        }
        if(incrementIndex >= 0)
            ++current[incrementIndex];
    }
    return result;
}

Alternative solution:

Step one: read my series of articles on how to generate all strings which match a context sensitive grammar:

http://blogs.msdn.com/b/ericlippert/archive/tags/grammars/

Step two: define a grammar that generates the language you want. For example, you could define the grammar:

S: a A b B c C
A: 1 | 2 | 3
B: 1 | 2
C: 1 | 2 | 3 | 4

Clearly you can easily generate that grammar definition string from your two arrays. Then feed that into the code which generates all strings in a given grammar, and you're done; you'll get all the possibilities. (Not necessesarily in the order you want them in, mind you.)


Fon another solution not linq based you can use:

public class CartesianProduct<T>
    {
        int[] lengths;
        T[][] arrays;
        public CartesianProduct(params  T[][] arrays)
        {
            lengths = arrays.Select(k => k.Length).ToArray();
            if (lengths.Any(l => l == 0))
                throw new ArgumentException("Zero lenght array unhandled.");
            this.arrays = arrays;
        }
        public IEnumerable<T[]> Get()
        {
            int[] walk = new int[arrays.Length];
            int x = 0;
            yield return walk.Select(k => arrays[x++][k]).ToArray();
            while (Next(walk))
            {
                x = 0;
                yield return walk.Select(k => arrays[x++][k]).ToArray();
            }

        }
        private bool Next(int[] walk)
        {
            int whoIncrement = 0;
            while (whoIncrement < walk.Length)
            {
                if (walk[whoIncrement] < lengths[whoIncrement] - 1)
                {
                    walk[whoIncrement]++;
                    return true;
                }
                else
                {
                    walk[whoIncrement] = 0;
                    whoIncrement++;
                }
            }
            return false;
        }
    }

You can find an example on how to use it here.


Using Enumerable.Append, which was added in .NET Framework 4.7.1, @EricLippert's answer can be implemented without allocating a new array at each iteration:

public static IEnumerable<IEnumerable<T>> CartesianProduct<T>
    (this IEnumerable<IEnumerable<T>> enumerables)
{
    IEnumerable<IEnumerable<T>> Seed() { yield return Enumerable.Empty<T>(); }

    return enumerables.Aggregate(Seed(), (accumulator, enumerable)
        => accumulator.SelectMany(x => enumerable.Select(x.Append)));
}

I'm not willing to give you the complete source code. So here's the idea behind.

You can generate the elements the following way:

I assume A=(a1, a2, ..., an) and B=(b1, b2, ..., bn) (so A and B each hold n elements).

Then do it recursively! Write a method that takes an A and a B and does your stuff:

If A and B each contain just one element (called an resp. bn), just iterate from 1 to bn and concatenate an to your iterating variable.

If A and B each contain more then one element, grab the first elements (a1 resp b1), iterate from 1 to bn and do for each iteration step:

  • call the method recursively with the subfields of A and B starting at the second element, i.e. A'=(a2, a3, ..., an) resp B'=(b2, b3, ..., bn). For every element generated by the recursive call, concatenate a1, the iterating variable and the generated element from the recursive call.

Here you can find an analouge example of how to generate things in C#, you "just" have to adapt it to your needs.