Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest language for FOR loops

I'm trying to figure out the best programming language for an analytical model I'm building. Primary consideration is speed at which it will run FOR loops.

Some detail:

  • The model needs to perform numerous (~30 per entry, over 12 cycles) operations on a set of elements from an array -- there are ~300k rows, and ~150 columns in the array. Most of these operations are logical in nature, e.g., if place(i) = 1, then j(i) = 2.
  • I've built an earlier version of this model using Octave -- to run it takes ~55 hours on an Amazon EC2 m2.xlarge instance (and it uses ~10 GB of memory, but I'm perfectly happy to throw more memory at it). Octave/Matlab won't do elementwise logical operations, so a large number of for loops are needed -- I'm relatively certain that I've vectorized as much as possible -- the loops that remain are necessary. I've gotten octave-multicore to work with this code, which makes some improvement (~30% speed reduction when I get it running on 8 EC2 cores), but ends up being unstable with file locking, etc. +I'm really looking for a step change in run-time -- I know that actually using Matlab might get me as much as a 50% improvement from looking at some benchmarks, but that is cost-prohibitive. The original plan when starting this was to actually run a Monte Carlo with this, but at 55 hours a run, that's completely impractical.
  • The next version of this is going to be a complete rebuild from the ground up (for IP reasons I won't get in to if nothing else), so I'm completely open to any programming language. I'm most familiar with Octave/Matlab, but have dabbled in R, C, C++, Java. I'm also proficient w/ SQL if the solution involves storing the data in a database. I'll learn any language for this -- these aren't complicated functionality we're looking for, no interfacing with other programs, etc., so not too concerned about learning curve.

So with all that said, what's the fastest programming language specifically for FOR loops? From a search of SO and Google, Fortran and C bubble to the top, but looking for some more advice before diving in to one or the other.

Thanks!

like image 798
Noah Avatar asked Jul 07 '10 00:07

Noah


People also ask

Which language has the fastest for loop?

C++ is one of the most efficient and fastest languages.

Which for loop is faster?

The for loop is the fastest but poorly readable. The foreach is fast, and iteration is controllable. The for…of takes time, but it's sweeter. The for…in takes time, hence less convenient.

What is the fastest scripting language?

Several benchmarks show Lua as the fastest language in the realm of interpreted scripting languages. Lua is fast not only in fine-tuned benchmark programs, but in real life too. Substantial fractions of large applications have been written in Lua.

Is Fortran faster than C?

Fortran semantics say that function arguments never alias and there is an array type, where in C arrays are pointers. This is why Fortran is often faster than C. This is why numerical libraries are still written in Fortran. However, it comes at the cost of pointer arithmetic.


3 Answers

This for loop looks no more complex than this when it hits the CPU:

for(int i = 0; i != 1024; i++) translates to

mov r0, 0           ;;start the counter    
top:

;;some processing

add r0, r0, 1       ;;increment the counter by 1
jne top: r0, 1024   ;;jump to the loop top if we havn't hit the top of the for loop (1024 elements)

;;continue on

As you can tell, this is sufficiently simple you can't really optimize it very well[1]... Refocus towards the algorithm level.

The first cut at the problem is to look at cache locality. Look up the classic example of matrix multiplication and swapping the i and j indexes.

edit: As a second cut, I would suggest evaluating the algorithm for data-dependencies between iterations and data dependency between localities in your 'matrix' of data. It may be a good candidate for parallelization.

[1] There are some micro-optimizations possible, but those will not produce the speedsups you're looking for.

like image 151
Paul Nathan Avatar answered Sep 23 '22 12:09

Paul Nathan


~300k * ~150 * ~30 * ~12 = ~16G iterations, right? This number of primitive operations should complete in a matter of minutes (if not seconds) in any compiled language on any decent CPU. Fortran, C/C++ should do it almost equally well. Even managed languages, such as Java and C#, would only fall behind by a small margin (if at all).

If you have a problem of ~16G iterations running 55 hours, this means that they are very far from being primitive (80k per second? this is ridiculous), so maybe we should know more. (as was already suggested, is disk access limiting performance? is it network access?)

like image 21
Rotsor Avatar answered Sep 22 '22 12:09

Rotsor


As @Rotsor said, 16G operations / 55 hours is about 80,000 operations per second, or one operation every 12.5 microseconds. That's a lot of time per operation.

That means your loops are not the cause of poor performance, it's what's in the innermost loop that's taking the time. And Octave is an interpreted language. That alone means an order of magnitude slowdown.

If you want speed, you first need to be in a compiled language. Then you need to do performance tuning (aka profiling) or, just single step it in a debugger at the instruction level. That will tell you what it is actually doing in its heart of hearts. Once you've got it to where it's not wasting cycles, fancier hardware, cores, CUDA, etc. will give you a parallelism speedup. But it's silly to do that if your code is taking unnecessarily many cycles. (Here's an example of performance tuning - a 43x speedup just by trimming the fat.)

I can't believe the number of responders talking about matlab, APL, and other vectorized languages. Those are interpreters. They give you concise source code, which is not at all the same thing as fast execution. When it comes down to the bare metal, they are stuck with the same hardware as every other language.


Added: to show you what I mean, I just ran this C++ code, which does 16G operations, on this moldy old laptop, and it took 94 seconds, or about 6ns per iteration. (I can't believe you baby-sat that thing for 2 whole days.)

void doit(){
  double sum = 0;
  for (int i = 0; i < 1000; i++){
    for (int j = 0; j < 16000000; j++){
      sum += j * 3.1415926;
    }
  }
}
like image 27
Mike Dunlavey Avatar answered Sep 23 '22 12:09

Mike Dunlavey