Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala performance on primes algorithm

I'm quite new on Scala and so in order to start writing some code I've implemented this simple program:

package org.primes.sim

object Primes {

  def is_prime(a: Int): Boolean = {
    val l = Stream.range(3, a, 2) filter { e => a % e == 0}
    l.size == 0
  }

  def gen_primes(m: Int) = 
    2 #:: Stream.from(3, 2) filter { e => is_prime(e) } take m

  def primes(m : Int) = {
      gen_primes(m) foreach println      
  }

  def main(args: Array[String]) {
    if (args.size == 0)
      primes(10)
    else
      primes(args(0).toInt)
  }

}

It generates n primes starting from 2. Then I've implemented the same algorithm in C++11 using range-v3 library of Eric Nibler.This is the code:

#include <iostream>
#include <vector>
#include <string>
#include <range/v3/all.hpp>

using namespace std;
using namespace ranges;

inline bool is_even(unsigned int n) { return n % 2 == 0; }

inline bool is_prime(unsigned int n)
{
    if (n == 2)
        return true;
    else if (n == 1 || is_even(n))
        return false;
    else
        return ranges::any_of(
                view::iota(3, n) | view::remove_if(is_even),
                [n](unsigned int e) { return n % e == 0; }
            ) == false;
}

void primes(unsigned int n)
{
    auto rng = view::ints(2) | view::filter(is_prime);
    ranges::for_each(view::take(rng, n), [](unsigned int e){ cout << e << '\n'; });
}

int main(int argc, char* argv[])
{
    if (argc == 1)
        primes(100);
    else if (argc > 1)
    {
        primes(std::stoi(argv[1]));
    }
}

As you can see the code looks very similar but the performance are very different:

For n = 5000, C++ completes in 0,265s instead Scala completes in 24,314s!!! So, from this test, Scala seems 100x slower than C++11.

Which is the problem on Scala code? Could you give me some hints for a better usage of scalac?

Note: I've compiled the C++ code using gcc 4.9.2 and -O3 opt.

Thanks

like image 404
Gian Lorenzo Meocci Avatar asked May 03 '15 12:05

Gian Lorenzo Meocci


1 Answers

The main speed problem lies with your is_prime implementation.

First of all, you filter a Stream to find all divisors, and then check if there were none (l.size == 0). But it's faster to return false as soon as the first divisor is found:

def is_prime(a: Int): Boolean =
  Stream.range(3, a, 2).find(a % _ == 0).isEmpty

This decreased runtime from 22 seconds to 5 seconds for primes(5000) on my machine.

The second problem is Stream itself. Scala Streams are slow, and using them for simple number calculations is a huge overkill. Replacing Stream with Range decreased runtime further to 1,2 seconds:

def is_prime(a: Int): Boolean =
  3.until(a, 2).find(a % _ == 0).isEmpty

That's decent: 5x slower than C++. Usually, I'd stop here, but it is possible to decrease running-time a bit more if we remove the higher-order function find.

While nice-looking and functional, find also induces some overhead. Loop implementation (basically replacing find with foreach) further decreased runtime to 0,45 seconds, which is less than 2x slower than C++ (that's already on the order of JVM overhead):

def is_prime(a: Int): Boolean = {
  for (e <- 3.until(a, 2)) if (a % e == 0) return false
  true
}

There's another Stream in gen_primes, so doing something with it may improve the run time more, but in my opinion that's not necessary. At that point in performance improvement, I think it would be better to switch to some other algorithm of generating primes: e.g., using only primes, instead of all odd numbers, to look for divisors, or using Sieve of Eratosthenes.

All in all, functional abstractions in Scala are implemented with actual objects on the heap, which have some overhead, and JIT compiler can't fix everything. But the selling point of C++ is zero-cost abstractions: everything that is possible is expanded during compilation through templates, constexpr and further aggressively optimized by the compiler.

like image 129
Kolmar Avatar answered Oct 26 '22 09:10

Kolmar