As part of my Ph.D. research, I am working on development of numerical models of atmosphere and ocean circulation. These involve numerically solving systems of PDE's on the order of ~10^6 grid points, over ~10^4 time steps. Thus, a typical model simulation takes hours to a few days to complete when run in MPI on dozens of CPUs. Naturally, improving model efficiency as much as possible is important, while making sure the results are byte-to-byte identical.
While I feel quite comfortable with my Fortran programming, and am aware of quite some tricks to make code more efficient, I feel like there is still space to improve, and tricks that I am not aware of.
Currently, I make sure I use as few divisions as possible, and try not to use literal constants (I was taught to do this from very early on, e.g. use half=0.5 instead of 0.5 in actual computations), use as few transcendental functions as possible etc.
What other performance sensitive factors are there? At the moment, I am wondering about a few:
1) Does the order of mathematical operations matter? For example if I have:
a=1E-7 ; b=2E4 ; c=3E13
d=a*b*c
would d evaluate with different efficiency based on the order of multiplication? Nowadays, this must be compiler specific, but is there a straight answer? I notice d getting (slightly) different value based on the order (precision limit), but will this impact the efficiency or not?
2) Passing lots (e.g. dozens) of arrays as arguments to a subroutine versus accessing these arrays from a module within the subroutine?
3) Fortran 95 constructs (FORALL and WHERE) versus DO and IF? I know that these mattered back in the 90's when code vectorization was a big thing, but is there any difference now with modern compilers being able to vectorize explicit DO loops? (I am using PGI, Intel, and IBM compilers in my work)
4) Raising a number to an integer power versus multiplication? E.g.:
b=a**4
or
b=a*a*a*a
I have been taught to always use the latter where possible. Does this affect efficiency and/or precision? (probably compiler dependent as well)
Please discuss and/or add any tricks and tips that you know about improving Fortran code efficiency. What else is out there? If you know anything specific to what each of the compilers above do related to this question, please include that as well.
Added: Note that I do not have any bottlenecks or performance issues per se. I am asking if there are any general rules for optimizing the code in sense of operations.
Thanks!
One of the reasons for this has to do with aliasing . Fortran has a syntax that is flexible enough for numerical computations but limited in some ways that allow better optimizing. This is the reason why fortran code can be faster than C or C++ code.
The benchmarks where Fortran is much slower than C++ involve processes where most of the time is spent reading and writing data, for which Fortran is known to be slow. So, altogether, C++ is just as fast as Fortran and often a bit faster.
Fortran is easier to use than C/C++ for scientific computing Fortran, on the other hand, has had native support for multidimensional arrays, complex numbers, and even some special functions, almost from the very beginning. Fortran also introduced the original array syntax that was later employed by Matlab.
Fortran is a general-purpose computer programming language that is especially suited to scientific computing problems. The high-level programming language maintains popularity due to its speed. Fortran programs run at high speeds, which are only comparable to a handful of languages such as C++.
You've got a-priori ideas about what to do, and some of them might actually help,
but the biggest payoff is in a-posteriori anaylsis.
(Added: In other words, getting a*b*c
into a different order might save a couple cycles (which I doubt), while at the same time you don't know you're not getting blind-sided by something spending 1000 cycles for no good reason.)
No matter how carefully you code it, there will be opportunities for speedup that you didn't foresee. Here's how I find them. (Some people consider this method controversial).
It's best to start with optimization flags OFF when you do this, so the code isn't all scrambled. Later you can turn them on and let the compiler do its thing.
Get it running under a debugger with enough of a workload so it runs for a reasonable length of time. While it's running, manually interrupt it, and take a good hard look at what it's doing and why. Do this several times, like 10, so you don't draw erroneous conclusions about what it's spending time at.
Here's examples of things you might find:
If you do this entire operation two or three times, you will have removed the stupid stuff that finds its way into any software when it's first written. After that, you can turn on the optimization, parallelism, or whatever, and be confident no time is being spent on silly stuff.
I second the advice that these tricks that you have been taught are silly in this era. Compilers do this for you now; such micro-optimizations are unlikely to make a significant difference and may not be portable. Write clear & understandable code. Carefully select your algorithm. One thing that can make a difference is using indices of multi-dimensions arrays in the correct order ... recasting an M X N array to N X M can help depending on the pattern of data access by your program. After this, if your program is too slow, measure where the CPU is consumed and improve only those parts. Experience shows that guessing is frequently wrong and leads to writing more opaque code for nor reason. If you make a code section in which your program spends 1% of its time twice as fast, it won't make any difference.
Here are previous answers on FORALL and WHERE: How can I ensure that my Fortran FORALL construct is being parallelized? and Do Fortran 95 constructs such as WHERE, FORALL and SPREAD generally result in faster parallel code?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With