I have an Iterator[(A1,B1)]
and two functions
fA: (Iterator[A1]) => Iterator[A2]
and fB: (Iterator[B1]) => Iterator[B2]
. Is it possible to make a fAB: (Iterator[(A1,B1)]) => Iterator[(A2,B2)]
without converting Iterators to Seq?
Both answers below are good. I selected @Aivean's answer because the code is simpler and it uses specialized scala data structure (Stream).
The only drawback is the stackoverfow limitation but it shouldn't be a problem for most use cases. If your iterator can be very (very) long, then @Alexey's solution should be preferred.
Unlike Iterative QuickSort, the iterative MergeSort doesn’t require explicit auxiliary stack. Note: In iterative merge sort, we do bottom up approach ie, start from 2 element sized array (we know that 1 element sized array is already sorted).
To merge this unmerged list at final merge we need to force the mid to be at the start of unmerged list so that it is a candidate for merge. The unmerged sublist element count that will be isolated until final merge call can be found out using the remainder (n % width). The final merge (when we have uneven lists) can be identified by (width>n/2).
Parsing CSV files in Python is quite easy. Python has an inbuilt CSV library which provides the functionality of both readings and writing the data from and to CSV files. There are a variety of formats available for CSV files in the library which makes data processing user-friendly. Reading CSV files using the inbuilt Python CSV module.
Using itertools.izip (), instead of zip () as in some of the other answers, will improve performance: Works like the zip () function but consumes less memory by returning an iterator instead of a list. Itertools.izip will also work properly even if one of the iterators is infinite.
No. Your hypothetical function has to call one of fA
and fB
first. Let's say it calls fA
and it requests all the A1
s before producing anything. Then you don't have any B1
s remaining to pass to fB
, unless you save them somewhere, potentially leaking memory. If that's acceptable, you can do:
def unzip[A, B](iter: Iterator[(A, B)]) = {
var qA = Queue.empty[A]
var qB = Queue.empty[B]
val iterA = new Iterator[A] {
override def hasNext = qA.nonEmpty || iter.hasNext
override def next() = qA.dequeueOption match {
case Some((a, qA1)) =>
qA = qA1
a
case None =>
val (a, b) = iter.next()
qB = qB.enqueue(b)
a
}
}
// similar iterB
(iterA, iterB)
}
and then
val (iterA, iterB) = unzip(iterator)
fA(iterAfA).zip(fB(iterB))
(Well, you can also write iterator => fA(iterator.map(_._1)).zip(fB(iterator.map(_._2))
: it has the right type, but is probably not what you want. Namely, it will use only one field of each tuple produced by the original iterator, and drop the other.)
I came to much simpler implementation:
def iterUnzip[A1, B1, A2, B2](it: Iterator[(A1, B1)],
fA: (Iterator[A1]) => Iterator[A2],
fB: (Iterator[B1]) => Iterator[B2]) =
it.toStream match {
case s => fA(s.map(_._1).toIterator).zip(fB(s.map(_._2).toIterator))
}
The idea is to convert iterator to stream. Stream
in Scala is lazy but also provides memoization. This effectively provides the same buffering mechanism, as in @AlexeyRomanov's solution, but more concise. The only drawback is that Stream
stores memoized elements on stack as opposed to the explicit Queue, thus if fA
and fB
produce elements on uneven rate, you may get StackOverflow exception.
Test that evaluation is lazy indeed:
val iter = Stream.from(0).map(x => (x, x + 1))
.map(x => {println("fetched: " + x); x}).take(5).toIterator
iterUnzip(
iter,
(_:Iterator[Int]).flatMap(x => List(x, x)),
(_:Iterator[Int]).map(_ + 1)
).toList
Result:
fetched: (0,1)
iter: Iterator[(Int, Int)] = non-empty iterator
fetched: (1,2)
fetched: (2,3)
fetched: (3,4)
fetched: (4,5)
res0: List[(Int, Int)] = List((0,2), (0,3), (1,4), (1,5), (2,6))
I also tried reasonably hard to get StackOverflow exception by producing uneven iterators, but failed.
val iter = Stream.from(0).map(x => (x, x + 1)).take(10000000).toIterator
iterUnzip(
iter,
(_:Iterator[Int]).flatMap(x => List.fill(1000000)(x)),
(_:Iterator[Int]).map(_ + 1)
).size
Works fine on -Xss5m
and produces:
res10: Int = 10000000
So, overall this is reasonably good and concise solution, unless you have some extreme usecases.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With