I have a Fortran90 program (Packmol) which, until, was implemented with static memory allocation.
I changed the code to use dynamic allocation, in such a way that all arrays are allocated at the beginning. I had a performance loss of 400% in some examples.
Then, I verified that even if the sizes of the arrays are the same as those when I used static allocation, the problem persists. That is, if I change the allocations from
to something like
double precision :: x(1000)
That is enough to cause the performance loss. Of course, when that is done for all arrays that need to be allocated dynamically, which are about 30.double precision, allocatable :: x(:)
allocate(x(1000))
Is there a way to allocate the arrays in a more efficient manner to reduce the performance penalty? Or do someone has a different suggestion?
Thank you very much.
EDIT: Somehow the problem was solved. The Dynamic version is now only slightly slower than the static version, which is expected. I do not know, really, what was causing the major slowdown before.
There can be many reasons for such a performance loss :
1) Static arrays are always allocated on the BSS (see Where are static variables stored (in C/C++)?), whereas "allocated" arrays can be allocated on the heap or on the stack. Allocation on the stack is much faster than on the heap. A good compiler can generate code that will allocate as much as possible on the stack.
2) You may have allocate/deallocate statements in loops. Every memory allocation will take some time. A good compiler can avoid allocating physically some memory at every allocation, but instead re-use space that has been deallocated.
3) The compiler knows dimensions at compile time with static arrays, so it will do some additional optimizations.
4) If you have multi-dimensional arrays, the calculation of the address of the elements can't be done at compile time. For example, the address of A(5,6,7)
is 5 + 6*n1 + 7*n1*n2
where n1
and n2
are the dimensions of A
: A(n1,n2,n3)
. For static arrays, the compiler can optimize this part. Moreover, if dimension n1,n2,...
is a power of 2, instead of doing an integer multiply the compiler will generate a bit shift which is 3x faster.
Number 3) is the most probable. You can leave some static arrays for arrays for which you know a reasonable upper-bound at compile time, and which are relatively small (<1000 elements roughly) and also inside routines that are called very often and which do a very small amount of work.
As a rule of thumb, only small arrays can be statically allocated : most of the 1D arrays, some small 2D arrays and tiny 3D arrays. Convert all the rest to dynamic allocation as they will probably not be able to fit in the stack.
If you have some frequent allocate/deallocates because you call a subroutine in a loop such as this:
do i=1,10000000
call work(a,b)
end do
subroutine work(a,b)
...
allocate (c)
...
deallocate (c)
end
if c
has always the same dimensions you can put it as an argument of the subroutine, or as a global variable that will be allocated only one before calling work:
use module_where_c_is_defined
allocate (c)
do i=1,10000000
call work(a,b)
end do
deallocate(c)
subroutine work(a,b)
use module_where_c_is_defined
if (.not.allocated(c)) then
stop 'c is not allocated'
endif
...
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With