Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is filter in front of foldLeft slow in Scala?

Tags:

I wrote an answer to the first Project Euler question:

Add all the natural numbers below one thousand that are multiples of 3 or 5.

The first thing that came to me was:

(1 until 1000).filter(i => (i % 3 == 0 || i % 5 == 0)).foldLeft(0)(_ + _) 

but it's slow (it takes 125 ms), so I rewrote it, simply thinking of 'another way' versus 'the faster way'

(1 until 1000).foldLeft(0){     (total, x) =>         x match {             case i if (i % 3 == 0 || i % 5 ==0) => i + total // Add             case _ => total //skip         } } 

This is much faster (only 2 ms). Why? I'm guess the second version uses only the Range generator and doesn't manifest a fully realized collection in any way, doing it all in one pass, both faster and with less memory. Am I right?

Here the code on IdeOne: http://ideone.com/GbKlP

like image 418
andyczerwonka Avatar asked Jan 25 '11 16:01

andyczerwonka


People also ask

How does foldLeft work in Scala?

The foldLeft method takes an associative binary operator function as parameter and will use it to collapse elements from the collection. The order for traversing the elements in the collection is from left to right and hence the name foldLeft. The foldLeft method allows you to also specify an initial value.

What is foldLeft in spark?

The Scala foldLeft method can be used to iterate over a data structure and perform multiple operations on a Spark DataFrame. foldLeft can be used to eliminate all whitespace in multiple columns or convert all the column names in a DataFrame to snake_case.


1 Answers

The problem, as others have said, is that filter creates a new collection. The alternative withFilter doesn't, but that doesn't have a foldLeft. Also, using .view, .iterator or .toStream would all avoid creating the new collection in various ways, but they are all slower here than the first method you used, which I thought somewhat strange at first.

But, then... See, 1 until 1000 is a Range, whose size is actually very small, because it doesn't store each element. Also, Range's foreach is extremely optimized, and is even specialized, which is not the case of any of the other collections. Since foldLeft is implemented as a foreach, as long as you stay with a Range you get to enjoy its optimized methods.

(_: Range).foreach:

@inline final override def foreach[@specialized(Unit) U](f: Int => U) {     if (length > 0) {         val last = this.last         var i = start         while (i != last) {             f(i)             i += step         }         f(i)     } } 

(_: Range).view.foreach

def foreach[U](f: A => U): Unit =      iterator.foreach(f) 

(_: Range).view.iterator

override def iterator: Iterator[A] = new Elements(0, length)  protected class Elements(start: Int, end: Int) extends BufferedIterator[A] with Serializable {   private var i = start    def hasNext: Boolean = i < end    def next: A =      if (i < end) {       val x = self(i)       i += 1       x     } else Iterator.empty.next    def head =      if (i < end) self(i) else Iterator.empty.next    /** $super    *  '''Note:''' `drop` is overridden to enable fast searching in the middle of indexed sequences.    */   override def drop(n: Int): Iterator[A] =     if (n > 0) new Elements(i + n, end) else this    /** $super    *  '''Note:''' `take` is overridden to be symmetric to `drop`.    */   override def take(n: Int): Iterator[A] =     if (n <= 0) Iterator.empty.buffered     else if (i + n < end) new Elements(i, i + n)      else this } 

(_: Range).view.iterator.foreach

def foreach[U](f: A =>  U) { while (hasNext) f(next()) } 

And that, of course, doesn't even count the filter between view and foldLeft:

override def filter(p: A => Boolean): This = newFiltered(p).asInstanceOf[This]  protected def newFiltered(p: A => Boolean): Transformed[A] = new Filtered { val pred = p }  trait Filtered extends Transformed[A] {   protected[this] val pred: A => Boolean    override def foreach[U](f: A => U) {     for (x <- self)       if (pred(x)) f(x)   }   override def stringPrefix = self.stringPrefix+"F" } 
like image 132
Daniel C. Sobral Avatar answered Oct 12 '22 11:10

Daniel C. Sobral