Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combination of elements

Tags:

scala

Problem:

Given a Seq seq and an Int n.

I basically want all combinations of the elements up to size n. The arrangement matters, meaning e.g. [1,2] is different that [2,1].

def combinations[T](seq: Seq[T], size: Int) = ...

Example:

combinations(List(1,2,3), 0) 
//Seq(Seq())

combinations(List(1,2,3), 1)
//Seq(Seq(), Seq(1), Seq(2), Seq(3))

combinations(List(1,2,3), 2) 
//Seq(Seq(), Seq(1), Seq(2), Seq(3), Seq(1,2), Seq(2,1), Seq(1,3), Seq(3,1),
//Seq(2,3), Seq(3,2))

...

What I have so far:

def combinations[T](seq: Seq[T], size: Int) = {
 @tailrec
  def inner(seq: Seq[T], soFar: Seq[Seq[T]]): Seq[Seq[T]] = seq match {
    case head +: tail => inner(tail, soFar ++ {
      val insertList = Seq(head)
      for {
        comb <- soFar
        if comb.size < size
        index <- 0 to comb.size
      } yield comb.patch(index, insertList, 0)
    })
    case _ => soFar
  }

  inner(seq, IndexedSeq(IndexedSeq.empty))
}

What would be your approach to this problem? This method will be called a lot and therefore it should be made most efficient.

There are methods in the library like subsets or combinations (yea I chose the same name), which return iterators. I also thought about that, but I have no idea yet how to design this lazily.

like image 480
Kigyo Avatar asked Aug 02 '14 22:08

Kigyo


2 Answers

Not sure if this is efficient enough for your purpose but it's the simplest approach.

def combinations[T](seq: Seq[T], size: Int) : Seq[Seq[T]] = {
  (1 to size).flatMap(i => seq.combinations(i).flatMap(_.permutations))
}

edit: to make it lazy you can use a view

def combinations[T](seq: Seq[T], size: Int) : Iterable[Seq[T]] = {
  (1 to size).view.flatMap(i => seq.combinations(i).flatMap(_.permutations))
}
like image 125
SpiderPig Avatar answered Sep 20 '22 10:09

SpiderPig


From permutations theory we know that the number of permutations of K elements taken from a set of N elements is

N! / (N - K)!

(see http://en.wikipedia.org/wiki/Permutation)

Therefore if you wanna build them all, you will have

algorithm complexity = number of permutations * cost of building each permutation

The potential optimization of the algorithm lies in minimizing the cost of building each permutation, by using a data structure that has some appending / prepending operation that runs in O(1).

You are using an IndexedSeq, which is a collection optimized for O(1) random access. When collections are optimized for random access they are backed by arrays. When using such collections (this is also valid for java ArrayList) you give up the guarantee of a low cost insertion operation: sometimes the array won't be big enough and the collection will have to create a new one and copy all the elements.

When using instead linked lists (such as scala List, which is the default implementation for Seq) you take the opposite choice: you give up constant time access for constant time insertion. In particular, scala List is a recursive data structure with constant time insertion at the front.

So if you are looking for high performance and you need the collection to be available eagerly, use a Seq.empty instead of IndexedSeq.empty and at each iteration prepend the new element at the head of the Seq. If you need something lazy, use Stream which will minimize memory occupation. Additional strategies worth exploring is to create an empty IndexedSeq for your first iteration, but not through Indexed.empty. Use instead the builder and try to provide an array which has the right size (N! / (N-K)!)

like image 44
Edmondo1984 Avatar answered Sep 22 '22 10:09

Edmondo1984