I'm quite new on Scala and so in order to start writing some code I've implemented this simple program:
package org.primes.sim
object Primes {
def is_prime(a: Int): Boolean = {
val l = Stream.range(3, a, 2) filter { e => a % e == 0}
l.size == 0
}
def gen_primes(m: Int) =
2 #:: Stream.from(3, 2) filter { e => is_prime(e) } take m
def primes(m : Int) = {
gen_primes(m) foreach println
}
def main(args: Array[String]) {
if (args.size == 0)
primes(10)
else
primes(args(0).toInt)
}
}
It generates n primes starting from 2. Then I've implemented the same algorithm in C++11 using range-v3 library of Eric Nibler.This is the code:
#include <iostream>
#include <vector>
#include <string>
#include <range/v3/all.hpp>
using namespace std;
using namespace ranges;
inline bool is_even(unsigned int n) { return n % 2 == 0; }
inline bool is_prime(unsigned int n)
{
if (n == 2)
return true;
else if (n == 1 || is_even(n))
return false;
else
return ranges::any_of(
view::iota(3, n) | view::remove_if(is_even),
[n](unsigned int e) { return n % e == 0; }
) == false;
}
void primes(unsigned int n)
{
auto rng = view::ints(2) | view::filter(is_prime);
ranges::for_each(view::take(rng, n), [](unsigned int e){ cout << e << '\n'; });
}
int main(int argc, char* argv[])
{
if (argc == 1)
primes(100);
else if (argc > 1)
{
primes(std::stoi(argv[1]));
}
}
As you can see the code looks very similar but the performance are very different:
For n = 5000, C++ completes in 0,265s instead Scala completes in 24,314s!!! So, from this test, Scala seems 100x slower than C++11.
Which is the problem on Scala code? Could you give me some hints for a better usage of scalac?
Note: I've compiled the C++ code using gcc 4.9.2 and -O3 opt.
Thanks
The main speed problem lies with your is_prime
implementation.
First of all, you filter a Stream to find all divisors, and then check if there were none (l.size == 0
). But it's faster to return false
as soon as the first divisor is found:
def is_prime(a: Int): Boolean =
Stream.range(3, a, 2).find(a % _ == 0).isEmpty
This decreased runtime from 22 seconds to 5 seconds for primes(5000)
on my machine.
The second problem is Stream
itself. Scala Streams are slow, and using them for simple number calculations is a huge overkill. Replacing Stream
with Range
decreased runtime further to 1,2 seconds:
def is_prime(a: Int): Boolean =
3.until(a, 2).find(a % _ == 0).isEmpty
That's decent: 5x slower than C++. Usually, I'd stop here, but it is possible to decrease running-time a bit more if we remove the higher-order function find
.
While nice-looking and functional, find
also induces some overhead. Loop implementation (basically replacing find
with foreach
) further decreased runtime to 0,45 seconds, which is less than 2x slower than C++ (that's already on the order of JVM overhead):
def is_prime(a: Int): Boolean = {
for (e <- 3.until(a, 2)) if (a % e == 0) return false
true
}
There's another Stream in gen_primes
, so doing something with it may improve the run time more, but in my opinion that's not necessary. At that point in performance improvement, I think it would be better to switch to some other algorithm of generating primes: e.g., using only primes, instead of all odd numbers, to look for divisors, or using Sieve of Eratosthenes.
All in all, functional abstractions in Scala are implemented with actual objects on the heap, which have some overhead, and JIT compiler can't fix everything. But the selling point of C++ is zero-cost abstractions: everything that is possible is expanded during compilation through template
s, constexpr
and further aggressively optimized by the compiler.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With