I wrote a little benchmark to compare the performance of different interpreters/compilers for Python, Ruby, JavaScript and C++. As expected, it turns out that (optimized) C++ beats the scripting languages, but the factor by which it does so is incredibly high.
The results are:
sven@jet:~/tmp/js$ time node bla.js # * JavaScript with node * 0 real 0m1.222s user 0m1.190s sys 0m0.015s sven@jet:~/tmp/js$ time ruby foo.rb # * Ruby * 0 real 0m52.428s user 0m52.395s sys 0m0.028s sven@jet:~/tmp/js$ time python blub.py # * Python with CPython * 0 real 1m16.480s user 1m16.371s sys 0m0.080s sven@jet:~/tmp/js$ time pypy blub.py # * Python with PyPy * 0 real 0m4.707s user 0m4.579s sys 0m0.028s sven@jet:~/tmp/js$ time ./cpp_non_optimized 1000 1000000 # * C++ with -O0 (gcc) * 0 real 0m1.702s user 0m1.699s sys 0m0.002s sven@jet:~/tmp/js$ time ./cpp_optimized 1000 1000000 # * C++ with -O3 (gcc) * 0 real 0m0.003s # (!!!) <---------------------------------- WHY? user 0m0.002s sys 0m0.002s
I am wondering if anybody can provide an explanation why the optimized C++ code is over three orders of magnitude faster than everything else.
The C++ benchmark uses command-line parameters in order to prevent pre-computing the result at compile time.
Below, I placed the source codes for the different language benchmarks, which should be semantically equivalent. Also, I provided the assembly code for the optimized C++ compiler output (using gcc). When looking at the optimized assembly, it seems that the compiler merged the two loops in the benchmark to a single one, but nevertheless, there IS still a loop!
JavaScript:
var s = 0; var outer = 1000; var inner = 1000000; for (var i = 0; i < outer; ++i) { for (var j = 0; j < inner; ++j) { ++s; } s -= inner; } console.log(s);
Python:
s = 0 outer = 1000 inner = 1000000 for _ in xrange(outer): for _ in xrange(inner): s += 1 s -= inner print s
Ruby:
s = 0 outer = 1000 inner = 1000000 outer_end = outer - 1 inner_end = inner - 1 for i in 0..outer_end for j in 0..inner_end s = s + 1 end s = s - inner end puts s
C++:
#include <iostream> #include <cstdlib> #include <cstdint> int main(int argc, char* argv[]) { uint32_t s = 0; uint32_t outer = atoi(argv[1]); uint32_t inner = atoi(argv[2]); for (uint32_t i = 0; i < outer; ++i) { for (uint32_t j = 0; j < inner; ++j) ++s; s -= inner; } std::cout << s << std::endl; return 0; }
Assembly (when compiling the above C++ code with gcc -S -O3 -std=c++0x):
.file "bar.cpp" .section .text.startup,"ax",@progbits .p2align 4,,15 .globl main .type main, @function main: .LFB1266: .cfi_startproc pushq %r12 .cfi_def_cfa_offset 16 .cfi_offset 12, -16 movl $10, %edx movq %rsi, %r12 pushq %rbp .cfi_def_cfa_offset 24 .cfi_offset 6, -24 pushq %rbx .cfi_def_cfa_offset 32 .cfi_offset 3, -32 movq 8(%rsi), %rdi xorl %esi, %esi call strtol movq 16(%r12), %rdi movq %rax, %rbp xorl %esi, %esi movl $10, %edx call strtol testl %ebp, %ebp je .L6 movl %ebp, %ebx xorl %eax, %eax xorl %edx, %edx .p2align 4,,10 .p2align 3 .L3: # <--- Here is the loop addl $1, %eax # <--- cmpl %eax, %ebx # <--- ja .L3 # <--- .L2: movl %edx, %esi movl $_ZSt4cout, %edi call _ZNSo9_M_insertImEERSoT_ movq %rax, %rdi call _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_ popq %rbx .cfi_remember_state .cfi_def_cfa_offset 24 popq %rbp .cfi_def_cfa_offset 16 xorl %eax, %eax popq %r12 .cfi_def_cfa_offset 8 ret .L6: .cfi_restore_state xorl %edx, %edx jmp .L2 .cfi_endproc .LFE1266: .size main, .-main .p2align 4,,15 .type _GLOBAL__sub_I_main, @function _GLOBAL__sub_I_main: .LFB1420: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 movl $_ZStL8__ioinit, %edi call _ZNSt8ios_base4InitC1Ev movl $__dso_handle, %edx movl $_ZStL8__ioinit, %esi movl $_ZNSt8ios_base4InitD1Ev, %edi addq $8, %rsp .cfi_def_cfa_offset 8 jmp __cxa_atexit .cfi_endproc .LFE1420: .size _GLOBAL__sub_I_main, .-_GLOBAL__sub_I_main .section .init_array,"aw" .align 8 .quad _GLOBAL__sub_I_main .local _ZStL8__ioinit .comm _ZStL8__ioinit,1,1 .hidden __dso_handle .ident "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2" .section .note.GNU-stack,"",@progbits
The programs that you write in C compile and execute much faster than those written in other languages. This is because it does not have garbage collection and other such additional processing overheads. Hence, the language is faster as compared to most other programming languages.
Another reason is closeness of C to the Assembly language, in most of the cases its instructions directly map to assembly language, C is only one or level two levels of abstraction away from assembly language while Java is at minimum 3 levels abstraction away from assembler.
Reason 1: Tight Data Structures. First, C++ is intrinsically stingy with memory (unlike Java objects, a C++ struct has no memory overhead if there are no virtual functions [modulo word alignment issues]). Smaller things run faster due to caching, and are also more scalable. Of course, this is true of C, too.
Performance-based on Nature Of Language C++ language is an object-oriented programming language, and it supports some important features like Polymorphism, Abstract Data Types, Encapsulation, etc. Since it supports object-orientation, speed is faster compared to the C language.
The optimizer has worked out that the inner loop along with the subsequent line is a no-op, and eliminated it. Unfortunately it hasn't managed to eliminate the outer loop as well.
Note that the node.js example is faster than the unoptimized C++ example, indicating that V8 (node's JIT compiler) has managed to eliminate at least one of the loops. However, its optimization has some overhead, as (like any JIT compiler) it must balance the opportunities for optimization and profile-guided re-optimization against the cost of doing so.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With