Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why is this simple C++ addition 6 times slower than the equivalent Java?

hello stackoverflow users, this is my first question asked, so if there are any errors in my way of expressing it, please point it out, thank you

I wrote this simple calculation function in both Java and C++

Java:

long start = System.nanoTime();
long total = 0;
for (int i = 0; i < 2147483647; i++) {
    total += i;
}
System.out.println(total);
System.out.println(System.nanoTime() - start);

C++:

auto start = chrono::high_resolution_clock::now();
register long long total = 0;
for (register int i = 0; i < 2147483647; i++)
{
    total += i;
}
cout << total << endl;
auto finish = chrono::high_resolution_clock::now();
cout << chrono::duration_cast<chrono::nanoseconds>(finish - start).count() << endl;

software: - JDK8u11 - Microsoft Visual C++ Compiler (2013)

results:

Java: 2305843005992468481 1096361110

C++: 2305843005992468481 6544374300

The calculation results are the same, which is good however, the nano time printed shows the Java program takes 1 second while in C++ it takes 6 seconds to execute

I've been doing Java for quite some time, but I am new to C++, is there any problem in my code? or is it a fact that C++ is slower than Java with simple calculations?

also, i used the "register" keyword in my C++ code, hoping it will bring performance improvements, but the execution time doesn't differ at all, could someone explain this?

EDIT: My mistake here is the C++ compiler settings are not optimized, and output is set to x32, after applying /O2 WIN64 and removing DEBUG, the program only took 0.7 seconds to execute

The JDK by default applies optimization to output, however this is not the case for VC++, which favors compilation speed by default, different C++ compilers also vary in result, some will calculate the loop's result in compile time, leading to extremely short execution times (around 5 microseconds)

NOTE: Given the right conditions, the C++ program will perform better than Java in this simple test, however I noticed many runtime safety checks are skipped, violating it's debug intention as a "safe language", I believe C++ will even more outperform Java in a large array test, as it does not have bound checking

like image 242
shingekinolinus Avatar asked Mar 20 '23 00:03

shingekinolinus


1 Answers

On Linux/Debian/Sid/x86-64, using OpenJDK 7 with

// file test.java
class Test {
    public static void main(String[] args) {
    long start = System.nanoTime();
    long total = 0;
    for (int i = 0; i < 2147483647; i++) {
        total += i;
    }
    System.out.println(total);
    System.out.println(System.nanoTime() - start);
    }
}   

and GCC 4.9 with

   // file test.cc
#include <iostream>
#include <chrono>

int main (int argc, char**argv) {
 using namespace std;
 auto start = chrono::high_resolution_clock::now();
 long long total = 0;
 for (int i = 0; i < 2147483647; i++)
   {
     total += i;
   }
 cout << total << endl;
 auto finish = chrono::high_resolution_clock::now();
 cout << chrono::duration_cast<chrono::nanoseconds>(finish - start).count()
      << endl;
}    

Then compiling and running test.java with

javac test.java
java Test

I'm getting the output

2305843005992468481
774937152

when compiling test.cc with optimizations

g++ -O2 -std=c++11 test.cc -o test-gcc

and running ./test-gcc it goes much faster

2305843005992468481
40291

Of course without optimizations g++ -std=c++11 test.cc -o test-gcc the run is slower

2305843005992468481
5208949116

By looking at the assembler code using g++ -O2 -fverbose-asm -S -std=c++11 test.cc I see that the compiler computed the result at compile time:

    .globl  main
    .type   main, @function
  main:
  .LFB1530:
    .cfi_startproc
    pushq   %rbx    #
    .cfi_def_cfa_offset 16
    .cfi_offset 3, -16
    call    _ZNSt6chrono3_V212system_clock3nowEv    #
    movabsq $2305843005992468481, %rsi  #,
    movl    $_ZSt4cout, %edi    #,
    movq    %rax, %rbx  #, start
    call    _ZNSo9_M_insertIxEERSoT_    #
    movq    %rax, %rdi  # D.35007,
    call    _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_  #
    call    _ZNSt6chrono3_V212system_clock3nowEv    #
    subq    %rbx, %rax  # start, D.35008
    movl    $_ZSt4cout, %edi    #,
    movq    %rax, %rsi  # D.35008, D.35008
    call    _ZNSo9_M_insertIlEERSoT_    #
    movq    %rax, %rdi  # D.35007,
    call    _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_  #
    xorl    %eax, %eax  #
    popq    %rbx    #
    .cfi_def_cfa_offset 8
    ret
    .cfi_endproc
  .LFE1530:
            .size   main, .-main

So you just need to enable optimizations in your compiler (or switch to a better compiler, like GCC 4.9)

BTW on Java low level optimizations happen in the JIT of the JVM. I don't know JAVA well but I don't think I need to switch them on. I do know that on GCC you need to enable optimizations which of course are ahead of time (e.g. with -O2)

PS: I never used any Microsoft compiler in this 21st century, so I cannot help you on how to enable optimizations in it.

At last, I dont believe that such microbenchmarks are significant. Benchmark then optimize your real applications.

like image 149
Basile Starynkevitch Avatar answered Mar 25 '23 16:03

Basile Starynkevitch