Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Scala, why does my Sieve algorithm runs so slowly?

I'm trying to implement the Sieve of Eratosthenes using lists and filters rather than arrays and looping. I'm not sure why the following performs significantly worse than an imperative equivalent. 1 million should absolutely fly but my machine grinds to a halt.

  val max = 1000000
  def filterPrimes(upper: Int, seed: Int = 2, sieve: List[Int] = List()): List[Int] =
    sieve.map(x => if (x % seed == 0 && x > seed) 0 else x).filter(_ > 0)

  var filtered: List[Int] = (2 to max).toList
  for (i <- 2 to max / 2) filtered = filterPrimes(max, i, filtered)
  filtered.foreach(println(_))
like image 381
deltanovember Avatar asked Oct 27 '11 05:10

deltanovember


3 Answers

There are a few potential issues, although I don't really see a single "smoking gun"... Anyway, here's what I've got. First:

sieve.map(x => if (x % seed == 0 && x > seed) 0 else x).filter(_ > 0)

could be written more concisely as:

sieve.filter(x => x <= seed || x % seed != 0)

Next, upper is unused in filterPrimes (this should have no effect on performance though).

Third, don't use a var and a for loop if you want to really use a pure functional style, instead turn filterPrimes into a tail-recursive function. The compiler might be clever enough to optimize away the copies if you do it this way (although I wouldn't hold my breath).

Finally, and probably most importantly, your for loop is wasting a huge amount of time filtering out values that have necessarily already been filtered. For example, it tries to filter multiples of 4 after already having filtered all multiples of 2. If you want to use this sieve algorithm efficiently, you need to choose your seeds from the remaining elements in the list.

In other words, keep an index into the list, and determine the seed from the index, like:

iteration 0: 2 3 4 5 6 7 8 9 ...
      index: ^

iteration 1: 2 3 5 7 9 ...
      index:   ^

iteration 2: 2 3 5 7 ...
      index:     ^

this avoids the duplicate effort. Also, you don't need to keep iterating until you get to max, I think you can actually stop when you get past sqrt(max).

like image 168
mergeconflict Avatar answered Nov 03 '22 00:11

mergeconflict


If you'd like to see a functional way of doing the sieve, check out The Genuine Sieve of Eratosthenes.

like image 38
Ken Wayne VanderLinde Avatar answered Nov 03 '22 00:11

Ken Wayne VanderLinde


I would make a few modifications.

  • It seems odd that you perform filterPrimes for all numbers between 2 and max / 2, the "actual" sieve technique requires you only perform filterPrimes for all primes between 2 and sqrt(max).
  • It also seems odd that you use a var and a for loop. To do it the "functional" way, I would use a recursive function instead.
  • Instead of performing filterPrimes on the entire list, you can collect the primes as you go; no need to throw those through the filter over and over.
  • It is rather strange to map and then filter the way you do, since the map simply flags which elements to filter, you can accomplish the same using only filter.

So here was my first attempt at these modifications:

def filterFactors(seed: Int, xs: List[Int]) = {
  xs.filter(x => x % seed != 0)
}

def sieve(max: Int) = {
  def go(xs: List[Int]) : List[Int] = xs match {
    case y :: ys => {
      if (y*y > max) y :: ys
      else y :: go(filterFactors(y, ys))
    }
    case Nil => Nil
  }

  go((2 to max).toList)
}

However, this reflects my Haskell bias, and has a huge flaw: it will take up a huge amount of stack space, due to the recursive call y :: go(...) in the go helper function. Running sieve(1000000) resulted in an "OutOfMemoryError" for me.

Let's try a common FP trick: tail recursion with accumulators.

def sieve(max: Int) = {
  def go(xs: List[Int],
         acc: List[Int])
         : List[Int] = xs match {
    case y :: ys => {
      if (y*y > max) acc.reverse ::: (y :: ys)
      else go(filterFactors(y, ys), y :: acc)
    }
    case Nil => Nil
  }

  go((2 to max).toList, Nil)
}

By adding an accumulator value we are able to write the go helper function in tail-recursive form, thus avoiding the huge stack problem from before. (Haskell's evaluation strategy is very different; it therefore neither needs nor benefits from tail recursion)

Now let's compare speed with an imperative, mutation-based approach.

def mutationSieve (max: Int) = {
    var arr: Array[Option[Int]] =
        (2 to max).map (x => Some (x)).toArray
    var i = 0
    var seed = (arr (i)).get
    while (seed * seed < max) {
        for (j: Int <- (i + seed) to (max - 2) by seed) {
          arr (j) = None
        }
        i += 1
        while (arr (i).isEmpty) {
          i += 1
        }
        seed = (arr (i)).get
    }
    arr.flatten
}

Here I use an Array[Option[Int]], and "cross off" a number by replacing its entry with "None". There is a tiny bit of room for optimization; perhaps a small speed boost could be obtained by using an array of bools, where the index represents the particular number. Whatever.

Using very primitive techinques (carefully placed new Date() calls...) I benchmarked the functional version to be approximately 6 times slower than the imperative version. It is clear that the two approaches have the same big-Oh time complexity, but the constant factors involved in programming with linked lists do incur a cost.

I also benchmarked your version, using Math.sqrt(max).ceil.toInt instead of max / 2: it was about 15x slower than the functional version I presented here. Interestingly, it is estimated1 that approximately 1 out of every 7 numbers between 1 and 1000 (sqrt(1000000)) is prime (1 / ln(1000)), therefore, a large part of the slowdown can be attributed to the fact that you perform the loop on every single number, while I perform my function only for every prime. Of course, if it took 15x as long to perform ~1000 iterations, it would take ~7500x as long to perform 500000 iterations, which is why your original code is agonizingly slow.

like image 5
Dan Burton Avatar answered Nov 03 '22 02:11

Dan Burton