Problem:
Given a Seq
seq
and an Int
n
.
I basically want all combinations of the elements up to size n. The arrangement matters, meaning e.g. [1,2] is different that [2,1].
def combinations[T](seq: Seq[T], size: Int) = ...
Example:
combinations(List(1,2,3), 0)
//Seq(Seq())
combinations(List(1,2,3), 1)
//Seq(Seq(), Seq(1), Seq(2), Seq(3))
combinations(List(1,2,3), 2)
//Seq(Seq(), Seq(1), Seq(2), Seq(3), Seq(1,2), Seq(2,1), Seq(1,3), Seq(3,1),
//Seq(2,3), Seq(3,2))
...
What I have so far:
def combinations[T](seq: Seq[T], size: Int) = {
@tailrec
def inner(seq: Seq[T], soFar: Seq[Seq[T]]): Seq[Seq[T]] = seq match {
case head +: tail => inner(tail, soFar ++ {
val insertList = Seq(head)
for {
comb <- soFar
if comb.size < size
index <- 0 to comb.size
} yield comb.patch(index, insertList, 0)
})
case _ => soFar
}
inner(seq, IndexedSeq(IndexedSeq.empty))
}
What would be your approach to this problem? This method will be called a lot and therefore it should be made most efficient.
There are methods in the library like subsets
or combinations
(yea I chose the same name), which return iterators. I also thought about that, but I have no idea yet how to design this lazily.
Not sure if this is efficient enough for your purpose but it's the simplest approach.
def combinations[T](seq: Seq[T], size: Int) : Seq[Seq[T]] = {
(1 to size).flatMap(i => seq.combinations(i).flatMap(_.permutations))
}
edit: to make it lazy you can use a view
def combinations[T](seq: Seq[T], size: Int) : Iterable[Seq[T]] = {
(1 to size).view.flatMap(i => seq.combinations(i).flatMap(_.permutations))
}
From permutations theory we know that the number of permutations of K elements taken from a set of N elements is
N! / (N - K)!
(see http://en.wikipedia.org/wiki/Permutation)
Therefore if you wanna build them all, you will have
algorithm complexity = number of permutations * cost of building each permutation
The potential optimization of the algorithm lies in minimizing the cost of building each permutation, by using a data structure that has some appending
/ prepending
operation that runs in O(1)
.
You are using an IndexedSeq
, which is a collection optimized for O(1) random access. When collections are optimized for random access they are backed by arrays. When using such collections (this is also valid for java ArrayList
) you give up the guarantee of a low cost insertion operation: sometimes the array won't be big enough and the collection will have to create a new one and copy all the elements.
When using instead linked lists
(such as scala List
, which is the default implementation for Seq) you take the opposite choice: you give up constant time access for constant time insertion. In particular, scala List
is a recursive data structure with constant time insertion at the front.
So if you are looking for high performance and you need the collection to be available eagerly, use a Seq.empty
instead of IndexedSeq.empty
and at each iteration prepend the new element at the head of the Seq
. If you need something lazy, use Stream
which will minimize memory occupation. Additional strategies worth exploring is to create an empty IndexedSeq for your first iteration, but not through Indexed.empty. Use instead the builder and try to provide an array which has the right size (N! / (N-K)!)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With