How to merge iterator parser

Edit

Both answers below are good. I selected @Aivean's answer because the code is simpler and it uses specialized scala data structure (Stream).

The only drawback is the stackoverfow limitation but it shouldn't be a problem for most use cases. If your iterator can be very (very) long, then @Alexey's solution should be preferred.

711

asked Apr 14 '16 12:04

Juh_

2 Answers

No. Your hypothetical function has to call one of fA and fB first. Let's say it calls fA and it requests all the A1s before producing anything. Then you don't have any B1s remaining to pass to fB, unless you save them somewhere, potentially leaking memory. If that's acceptable, you can do:

def unzip[A, B](iter: Iterator[(A, B)]) = {
  var qA = Queue.empty[A]
  var qB = Queue.empty[B]

  val iterA = new Iterator[A] {
    override def hasNext = qA.nonEmpty || iter.hasNext

    override def next() = qA.dequeueOption match {
      case Some((a, qA1)) =>
        qA = qA1
        a
      case None =>
        val (a, b) = iter.next()
        qB = qB.enqueue(b)
        a
    }
  }

  // similar iterB

  (iterA, iterB)
}

and then

val (iterA, iterB) = unzip(iterator)
fA(iterAfA).zip(fB(iterB))

(Well, you can also write iterator => fA(iterator.map(_._1)).zip(fB(iterator.map(_._2)): it has the right type, but is probably not what you want. Namely, it will use only one field of each tuple produced by the original iterator, and drop the other.)

189

answered Sep 21 '22 12:09

Alexey Romanov

I came to much simpler implementation:

def iterUnzip[A1, B1, A2, B2](it: Iterator[(A1, B1)],
                           fA: (Iterator[A1]) => Iterator[A2],
                           fB: (Iterator[B1]) => Iterator[B2]) =
  it.toStream match {
    case s => fA(s.map(_._1).toIterator).zip(fB(s.map(_._2).toIterator))
  }

The idea is to convert iterator to stream. Stream in Scala is lazy but also provides memoization. This effectively provides the same buffering mechanism, as in @AlexeyRomanov's solution, but more concise. The only drawback is that Stream stores memoized elements on stack as opposed to the explicit Queue, thus if fA and fB produce elements on uneven rate, you may get StackOverflow exception.

Test that evaluation is lazy indeed:

val iter = Stream.from(0).map(x => (x, x + 1))
  .map(x => {println("fetched: " + x); x}).take(5).toIterator

iterUnzip(
  iter,
  (_:Iterator[Int]).flatMap(x => List(x, x)),
  (_:Iterator[Int]).map(_ + 1)
).toList

Result:

fetched: (0,1)
iter: Iterator[(Int, Int)] = non-empty iterator

fetched: (1,2)
fetched: (2,3)
fetched: (3,4)
fetched: (4,5)
res0: List[(Int, Int)] = List((0,2), (0,3), (1,4), (1,5), (2,6))

I also tried reasonably hard to get StackOverflow exception by producing uneven iterators, but failed.

val iter = Stream.from(0).map(x => (x, x + 1)).take(10000000).toIterator
iterUnzip(
    iter,
    (_:Iterator[Int]).flatMap(x => List.fill(1000000)(x)),
    (_:Iterator[Int]).map(_ + 1)
  ).size

Works fine on -Xss5m and produces:

res10: Int = 10000000

So, overall this is reasonably good and concise solution, unless you have some extreme usecases.

answered Sep 21 '22 12:09

Aivean

Related questions
                            
                                Using Bower with Play
                            
                                Can't get LWJGL to run using IDEA and SBT
                            
                                How to use Quasar with Scala under sbt?
                            
                                Dynamically parametrize Poly1 function in shapeless
                            
                                How to create a task that prints command line arguments?
                            
                                Testing laws of side-effecting monad
                            
                                How to run sequence over List[F[G[A]]] to get F[G[List[A]]]
                            
                                How to add WebJars to my Play app?
                            
                                Why does the "contains" method on "Option" use a second type with lower bound instead of an "Any" type?
                            
                                Intellij code style to align single-line comments
                            
                                How to call a stored procedure and get return value in Slick (using Scala)
                            
                                Reading very large files (~ 1 TB) in sequential blocks [duplicate]
                            
                                Spark-Shell: Howto define JAR loading order
                            
                                Typeclasses in Haskell v. Scala
                            
                                Spark: Input a vector
                            
                                Strange behavior of type inference in function with upper bound
                            
                                Implicit abstract class constructor parameter and inheritance in Scala
                            
                                How memory allocation takes place in scala
                            
                                How to used named parameters with a curried function in scala
                            
                                sbt idiomatic way to add settings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to merge iterator parser

Tags:

iterator

scala