Profiling one of our fortran codes, there are two subroutines that are taking up most of the computational time (22.1% and 17.2%). In each routines, ~5% of the time is spent allocating and freeing memory. These routines look like
MODULE foo
CONTAINS
SUBROUTINE bar( ... )
...
IMPLICIT NONE
...
REAL, ALLOCATABLE, DIMENSION(:,:) :: work
...
ALLOCATE (work(size1,size2))
...
DEALLOCATE (work)
END SUBROUTINE bar
...
END MODULE foo
These subroutines get called on the order of ~4000-5000 times in my bench mark so I would like to get rid of ALLOCATE and DEALLOCATE. Changing these to automatic arrays changes to profiler output to.
MODULE foo
CONTAINS
SUBROUTINE bar( ... )
...
IMPLICIT NONE
...
REAL, DIMENSION(size1,size2) :: work
...
END SUBROUTINE bar
...
END MODULE foo
Changes the resulting profile to
Running Time Symbol Name
20955.0ms 17.0% __totzsp_mod_MOD_totzsps
7.0ms 0.0% malloc
5.0ms 0.0% free
2.0ms 0.0% user_trap
16192.0ms 13.2% __tomnsp_mod_MOD_tomnsps
20.0ms 0.0% free
3.0ms 0.0% malloc
1.0ms 0.0% szone_size_try_large
I looks like gfortran is allocating these on the stack and not that heap but I'm concerned about when happens when these arrays become too large.
The second approach that I'm taking is to allocate and deallocate these arrays once.
work_array.f
MODULE work_array
IMPLICIT NONE
REAL(rprec), ALLOCATABLE, DIMENSION(:,:) :: work
END MODULE work_array
I allocate these once in a different part of the code. Now my subroutine looks like
MODULE foo
CONTAINS
SUBROUTINE bar( ... )
...
USE work_array
IMPLICIT NONE
...
END SUBROUTINE bar
...
END MODULE foo
However when I run the code now the profile get worse.
Running Time Symbol Name
30584.0ms 21.6% __totzsp_mod_MOD_totzsps
3494.0ms 2.4% free
3143.0ms 2.2% malloc
27.0ms 0.0% DYLD-STUB$$malloc_zone_malloc
19.0ms 0.0% szone_free_definite_size
6.0ms 0.0% malloc_zone_malloc
24325.0ms 17.1% __tomnsp_mod_MOD_tomnsps
2937.0ms 2.0% free
2456.0ms 1.7% malloc
23.0ms 0.0% DYLD-STUB$$malloc_zone_malloc
3.0ms 0.0% szone_free_definite_size
Where are these extra mallocs, and frees coming from? How can I set this up so I allocate these arrays once?
Since the work
array is only used inside the bar
subroutine, you could add the save
attribute to it and allocate it when the subroutine is called for the first time. If work1
or work2
is different compared to previous calls, you can just reallocate the array in that case.
This does leave the problem of deallocation once the subroutine is no longer needed. If you need to call it during the whole life-time of the program, it's no problem since the OS should deallocate the memory when the program quits. On the other hand, if you only need it during initialization, the memory will remain allocated even when not needed. Maybe you can add an argument to the subroutine which tells it to deallocate the work
array, if memory usage is a problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With