I'm getting System.OutOfMemoryException
when trying to generate 6 letter permutations. 5 letter permutations still work.
Here is the code I'm using to generate ALL permutations:
private static List<string> getPermutations(int n,string source) { IEnumerable<string> q = source.Select(x => x.ToString()); for (int i = 0; i < n - 1; i++) { q = q.SelectMany(x => source, (x, y) => x + y); } return q.ToList(); // THIS IS WHERE THE ERROR HAPPENS }
after which I am using this piece of code to filter them based on regex:
private static List<string> filterListByRegex(List<string> list, string regex) { List<string> newList = list.ToList(); for (int i = newList.Count - 1; i >= 0; i--) { Match match = Regex.Match(""+newList[i], regex, RegexOptions.IgnoreCase); if (!match.Success) { newList.RemoveAt(i); } } return newList; }
so since I don't really need ALL those permutations, is there a way to regex filter while generating permutations, or should I use a more efficient piece of code to generate permutations?
Here is a picture to better demonstrate what I'm trying to achieve:
The vertical alphabet string is the one I'm telling the code to use.
First, I would like to mention that what we are discussing here is not real permutations, but so called n-tuples
or permutations with repetition
- Wikipedia.
Second, regarding the System.OutOfMemoryException when generating permutations
, I think we all agree that the result should not be stored in a list, but provided as enumerable which will allow applying filtering and consuming (eventually storing) only the ones in interest.
In that regard the LINQ solution provided by @juharr is performing very well. So my goals are to minimize the intermediary memory allocations, including string concatenations and also to end up with a more general and faster solution.
In order to do that, I need to take some hard design decision. The signature of the general function I'm talking about will look like this
public static IEnumerable<T[]> RepeatingPermutations<T>(this T[] set, int N)
and the question is what should be the array yielded. If we follow the recomendations, they should be a separate array instances. However, remember I want to minimize allocations, I've decided to break that rules and yield one and the same array instance, moving the responsibility of not modifying it and cloning it if necessary to the caller. For instance, this allows the caller to perform no cost filtering. Or implement the OP function on top on it like this
public static IEnumerable<string> RepeatingPermutations(this string set, int N) { return set.ToCharArray().RepeatingPermutations(N).Select(p => new string(p)); }
A few words about the algorithm. Instead of looking at the problem recursively as some other answerers, I want to effectively implement the equivalent of something like this
from e1 in set from e2 in set ... from eN in set select new [] { e1, e2, .., eN }
Interestingly, I recently answered a combinations related question and realized that the algorithms are pretty much the same.
With all that being said, here is the function:
public static IEnumerable<T[]> RepeatingPermutations<T>(this T[] set, int N) { var result = new T[N]; var indices = new int[N]; for (int pos = 0, index = 0; ;) { for (; pos < N; pos++, index = 0) { indices[pos] = index; result[pos] = set[index]; } yield return result; do { if (pos == 0) yield break; index = indices[--pos] + 1; } while (index >= set.Length); } }
I've did some tests by simply calling the different functions with N=2,3,..6 and simply iterating and counting. Here are the results on my machine:
A : N=2 Count= 676 Time=00:00:00.0000467 Memory= 29K B1: N=2 Count= 676 Time=00:00:00.0000263 Memory= 16K B2: N=2 Count= 676 Time=00:00:00.0000189 Memory= 8K A : N=3 Count= 17,576 Time=00:00:00.0010107 Memory= 657K B1: N=3 Count= 17,576 Time=00:00:00.0003673 Memory= 344K B2: N=3 Count= 17,576 Time=00:00:00.0001415 Memory= 8K A : N=4 Count= 456,976 Time=00:00:00.0184445 Memory= 2,472K B1: N=4 Count= 456,976 Time=00:00:00.0096189 Memory= 2,520K B2: N=4 Count= 456,976 Time=00:00:00.0033624 Memory= 8K A : N=5 Count= 11,881,376 Time=00:00:00.4281349 Memory= 397K B1: N=5 Count= 11,881,376 Time=00:00:00.2482835 Memory= 4,042K B2: N=5 Count= 11,881,376 Time=00:00:00.0887759 Memory= 8K A : N=6 Count= 308,915,776 Time=00:00:11.2697326 Memory= 1,688K B1: N=6 Count= 308,915,776 Time=00:00:06.5638404 Memory= 1,024K B2: N=6 Count= 308,915,776 Time=00:00:02.2674431 Memory= 8K
where
A - LINQ function from @juharr
B1 - my function with string
B2 - my function with char[]
As we can see, memory wise both string functions are comparable. Performance wise the LINQ function is only ~2 times slower, which is pretty good result.
As expected in such scenario, the non allocating function significantly outperforms them both.
UPDATE: As requested in the comments, here is the sample usage of the above functions (note that they are extension methods and must be placed in a static class of your choice):
var charSet = Enumerable.Range('A', 'Z' - 'A' + 1).Select(c => (char)c).ToArray(); var charPermutations = charSet.RepeatingPermutations(3); var stringSet = new string(charset); var stringPermutations = stringSet.RepeatingPermutations(3);
However, remember the design choice I've made, so if you expand the charPermutations
inside the debugger, you'll see one and the same values (the last permutation). Consuming the whole result of the above call for char[]
should be like this
var charPermutationList = charSet.RepeatingPermutations(3) .Select(p => (char[])p.Clone()).ToList();
Actually a good addition to the two methods presented would be the following extension method:
public static IEnumerable<T[]> Clone<T>(this IEnumerable<T[]> source) { return source.Select(item => (T[])item.Clone()); }
so the consuming call would be simple
var charPermutationList = charSet.RepeatingPermutations(3).Clone().ToList();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With