Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala - merging multiple iterators

Tags:

iterator

scala

I have multiple iterators which return items in a sorted manner according to some sorting criterion. Now, I would like to merge (multiplex) the iterators into one, combined iterator. I know how to do it in Java style, with e.g. tree-map, but I was wondering if there is a more functional approach? I want to preserve the laziness of the iterators as much as possible.

like image 634
Bober02 Avatar asked May 01 '13 08:05

Bober02


3 Answers

You can just do:

val it = iter1 ++ iter2

It creates another iterator and does not evaluate the elements, but wraps the two existing iterators. It is fully lazy, so you are not supposed to use iter1 or iter2 once you do this.

In general, if you have more iterators to merge, you can use folding:

val iterators: Seq[Iterator[T]] = ???
val it = iterators.foldLeft(Iterator[T]())(_ ++ _)

If you have some ordering on the elements that you would like to maintain in the resulting iterator but you want lazyness, you can convert them to streams:

def merge[T: Ordering](iter1: Iterator[T], iter2: Iterator[T]): Iterator[T] = {
  val s1 = iter1.toStream
  val s2 = iter2.toStream

  def mergeStreams(s1: Stream[T], s2: Stream[T]): Stream[T] = {
    if (s1.isEmpty) s2
    else if (s2.isEmpty) s1
    else if (s1.head < s2.head) s1.head #:: mergeStreams(s1.tail, s2)
    else s2.head #:: mergeStreams(s1, s2.tail)
  }

  mergeStreams(s1, s2).iterator
}

Not necessarily faster though, you should microbenchmark this.

A possible alternative is to use buffered iterators to achieve the same effect.

like image 92
axel22 Avatar answered Nov 19 '22 05:11

axel22


Like @axel22 mentioned, you can do this with BufferedIterators. Here's one Stream-free solution:

def combine[T](rawIterators: List[Iterator[T]])(implicit cmp: Ordering[T]): Iterator[T] = {
  new Iterator[T] {
    private val iterators: List[BufferedIterator[T]] = rawIterators.map(_.buffered)

    def hasNext: Boolean = iterators.exists(_.hasNext)

    def next(): T = if (hasNext) {
      iterators.filter(_.hasNext).map(x => (x.head, x)).minBy(_._1)(cmp)._2.next()
    } else {
      throw new UnsupportedOperationException("Cannot call next on an exhausted iterator!")
    }
}
like image 32
sralmai Avatar answered Nov 19 '22 05:11

sralmai


You could try:

(iterA ++ iterB).toStream.sorted.toIterator

For example:

val i1 = (1 to 100 by 3).toIterator
val i2 = (2 to 100 by 3).toIterator
val i3 = (3 to 100 by 3).toIterator

val merged = (i1 ++ i2 ++ i3).toStream.sorted.toIterator

merged.next  // results in: 1
merged.next  // results in: 2
merged.next  // results in: 3
like image 3
Keith Pinson Avatar answered Nov 19 '22 05:11

Keith Pinson