Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Kotlin's map-filter-reduce slower than Java's Stream operations on large inputs?

Tags:

java

kotlin

A few days ago I created a simple benchmark (without jmh and all of the another specialized stuff, just to measure roughly).

I've found that for the same simple task (iterate through 10 million numbers, square them, filter only even numbers and reduce their sum), Java works much faster. Here's the code:

Kotlin:

fun test() {
    println((0 .. 10_000_000L).map { it * it }
                              .filter { it % 2 == 0L }
                              .reduce { sum, it -> sum + it })
}

Java:

public void test() {
    System.out.println(LongStream.range(0, 10_000_000)
                                 .map(it -> it * it)
                                 .filter(it -> it % 2 == 0)
                                 .reduce((sum, it) -> sum + it)
                                 .getAsLong());
}

I'm using Java version 1.8.0_144 and Kotlin version 1.2.

On my hardware in average it takes 85ms for Java and 4,470ms for Kotlin to execute the corresponding functions. Kotlin works 52 times slower.

I suspect that the Java compiler produces optimized bytecode, but I didn't expected to see such a huge difference. I'm wondering if I'm doing something wrong? How can I compel Kotlin to work faster? I like it because of its syntax, but 52 times is a big difference. And I just wrote Java 8-like code, not the plain old iterative version (which, I believe, will be much faster than given one).

like image 778
the_kaba Avatar asked Jan 18 '18 09:01

the_kaba


People also ask

Does Kotlin code run faster than Java code?

Java is a faster programming language than Kotlin. As one test revealed that Java has ~13% faster compilation speeds (with Gradle) than Kotlin (14.2 seconds vs 16.6 seconds) on average. However, the difference in speed is only for full builds.

What is the difference between map and filter in Java 8?

Filter takes a predicate as an argument so basically you are validating your input/collection against a condition, whereas a map allows you to define or use a existing function on the stream eg you can apply String.

What is MAP filter reduce in Java?

map applies as a transformation to an element. filter accumulates only elements matching a Predicate<T> . reduce accumulates all elements to a single value, by using immutable values.

Is kotlin slower?

Kotlin generates very similar bytecode to Java, so the performance of Kotlin code is in most cases the same as the performance of the equivalent Java code.


1 Answers

When you compare apples to oranges, the results don't tell you much. You compared one API to another API, each having a totally different focus and goals.

Since all of JDK is as much "Kotlin" as the Kotlin-specific additions, I wrote more of an apples-to-apples comparison, which also takes care of some of the "JVM microbenchmark" concerns.

Kotlin:

fun main(args: Array<String>) {
    println("Warming up Kotlin")
    test()
    test()
    test()
    println("Measuring Kotlin")
    val average = (1..10).map {
        measureTimeMillis { test() }
    }.average()
    println("An average Kotlin run took $average ms")
    println("(sum is $sum)")
}

var sum = 0L

fun test() {
    sum += LongStream.range(0L, 100_000_000L)
            .map { it * it }
            .filter { it % 2 == 0L }
            .reduce { sum, it -> sum + it }
            .asLong
}

Java:

public static void main(String[] args) {
    System.out.println("Warming up Java");
    test();
    test();
    test();
    System.out.println("Measuring Java");
    LongSummaryStatistics stats = LongStream.range(0, 10)
                                            .map(i -> measureTimeMillis(() -> test()))
                                            .summaryStatistics();
    System.out.println("An average Java run took " + stats.getAverage() + " ms");
    System.out.println("sum is " + sum);

}

private static long sum;

private static void test() {
    sum += LongStream.range(0, 100_000_000)
                     .map(it -> it * it)
                     .filter(it -> it % 2 == 0)
                     .reduce((sum, it) -> sum + it)
                     .getAsLong();
}

private static long measureTimeMillis(Runnable measured) {
    long start = System.nanoTime();
    measured.run();
    return TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
}

My results:

Warming up Kotlin
Measuring Kotlin
An average Kotlin run took 158.5 ms
(sum is 4276489111714942720)


Warming up Java
Measuring Java
An average Java run took 357.3 ms
sum is 4276489111714942720

Suprised? I was too.

Instead of digging further, trying to figure out this inversion of the expected results, I would like to make this conclusion:

Kotlin's FP extensions on Iterable are there for convenience. In 95% of all use cases you don't care whether it takes 1 or 2 µs to perform a quick map-filter on a list of 10-100 elements.

Java's Stream API is focused on the performance of bulk operations on large data structures. It also offers auto-parallelization towards the same goal (although it almost never actually helps), but its API is crippled and at times awkward due to these concerns. For example, many useful operations which don't happen to parallelize well are just not there, and the whole paradigm of non-terminal vs. terminal operations adds bulk to each and every Streams expression you write.


Let me also address a few more of your statements:

I know that the Java compiler produces optimized bytecode

This is a) not true and b) largely irrelevant because there is (almost) no such thing as "optimized bytecode". Interpreted execution of bytecode is always at least an order of magnitude slower than JIT-compiled native code.

And I just wrote Java 8-like code, not the plain old iterative version (which, I believe, will be much faster than given one).

You mean this?

Kotlin:

fun test() {
    var sum: Long = 0
    var i: Long = 0
    while (i < 100_000_000) {
        val j = i * i
        if (j % 2 == 0L) {
            sum += j
        }
        i++
    }
    total += sum
}

Java:

private static void test() {
    long sum = 0;
    for (long i = 0; i < 100_000_000; i++) {
        long j  = i * i;
        if (j % 2 == 0) {
            sum += j;
        }
    }
    total += sum;
}

These are the results:

Warming up Kotlin
Measuring Kotlin
An average Kotlin run took 150.1 ms
(sum is 4276489111714942720)

Warming up Java
Measuring Java
An average Java run took 153.0 ms
sum is 4276489111714942720

In both languages the performance is almost the same as Kotlin + Streams API above. As said, the Streams API is optimized for performance.

Both kotlinc and javac probably produced very similar bytecode given this straightforward source code, then HotSpot did its work on both the same way.

like image 77
Marko Topolnik Avatar answered Sep 17 '22 05:09

Marko Topolnik